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Preface 


This book is the end result of a long story that started with my involvement 
as Coordinator of the Statistical Mechanics section of the Italian Encyclo- 
pedia of Physics. 

An Italian edition collecting several papers that I wrote for the Encyclope- 
dia appeared in September 1995, with the permission of the Encyclopedia 
and the sponsorship of Consiglio Nazionale delle Ricerche (CNR-GNFM). 

The present work is not a translation of the Italian version but it overlaps 
with it: an important part of it (Ch.I,I,III, VII) is still based on three arti- 
cles written as entries for the —it Encicopledia della Fisica (namely: “Mec- 
canica Statistica”, “Teoria degli Insiem? and “Moto Browniano”) which 
make up about 29% of the present book and, furthermore, it still contains 
(with little editing and updating) my old review article on phase transitions 
(Ch.VI, published in La Rivista del Nuovo Cimento). In translating the 
ideas into English, I introduced many revisions and changes of perspective 
as well as new material (while also suppressing some other material). 

The aim was to provide an analysis, intentionally as nontechnical as I was 
able to make it, of many fundamental questions of Statistical Mechanics, 
about two centuries after its birth. Only in a very few places have I en- 
tered into really technical details, mainly on subjects that I should know 
rather well or that I consider particularly important (the convergence of the 
Kirkwood-Salsburg equations, the existence of the thermodynamic limit, 
the exact soltution of the Ising model, and in part the exact solution of the 
six vertex models). The points of view expressed here were presented in 
innumerable lectures and talks mostly to my students in Roma during the 
last 25 years. They are not always “mainstream views”; but I am confident 
that they are not too far from the conventionally accepted “truth” and I do 
not consider it appropriate to list the differences from other treatments. I 
shall consider this book a success if it prompts comments (even if dictated 
by strong disagreement or dissatisfaction) on the (few) points that might 
be controversial. This would mean that the work has attained the goal of 
being noticed and of being worthy of criticism. 

I hope that this work might be useful to students by bringing to their at- 
tention problems which, because of “concreteness necessities” (i.e. because 
such matters seem useless, or sometimes simply because of lack of time), 
are usually neglected even in graduate courses. 

This does not mean that I intend to encourage students to look at questions 
dealing with the foundations of Physics. I rather believe that young students 
should refrain from such activities, which should, possibly, become a subject 


ii 


of investigation after gaining an experience that only active and advanced 
research can provide (or at least the attempt at pursuing it over many 
years). And in any event I hope that the contents and the arguments I have 
selected will convey my appreciation for studies on the foundations that 
keep a strong character of concreteness. I hope, in fact, that this book will 
be considered concrete and far from speculative. 

Not that students should not develop their own philosophical beliefs about 
the problems of the area of Physics that interests them. Although one 
should be aware that any philosophical belief on the foundations of Physics 
(and Science), no matter how clear and irrefutable it might appear to the 
person who developed it after long meditations and unending vigils, is very 
unlikely to look less than objectionable to any other person who is given 
a chance to think about it, it is nevertheless necessary, in order to grow 
original ideas or even to just perform work of good technical quality, to 
possess precise philosophical convictions on the rerum natura. Provided 
one is always willing to start afresh, avoiding, above all, thinking one has 
finally reached the truth, unique, unchangeable and objective (into whose 
existence only vain hope can be laid). 

I am grateful to the Enciclopedia Italiana for having stimulated the begin- 
ning and the realization of this work, by assigning me the task of coordinat- 
ing the Statistical Mechanics papers. I want to stress that the financial and 
cultural support from the Enciclopedia have been of invaluable aid. The 
atmosphere created by the Editors and by my colleagues in the few rooms 
of their facilities stimulated me deeply. It is important to remark on the 
rather unusual editorial enterprise they led to: it was not immediately an- 
imated by the logic of profit that moves the scientific book industry which 
is very concerned, at the same time, to avoid possible costly risks. 

I want to thank G. Alippi, G. Altarelli, P. Dominici and V. Cappelletti who 
made a first version in Italian possible, mainly containing the Encyclopedia 
articles, by allowing the collection and reproduction of the texts of which the 
Encyclopedia retains the rights. I am indebted to V. Cappelletti for granting 
permission to include here the three entries I wrote for the Enciclopedia delle 
Scienze Fisiche (which is now published). I also thank the Nuovo Cimento 
for allowing the use of the 1972 review paper on the Ising model. 

I am indebted for critical comments on the various drafts of the work, 
in particular, to G. Gentile whose comments have been an essential con- 
tribution to the revision of the manuscript; I am also indebted to several 
colleagues: P. Carta, E. Järvenpää, N. Nottingham and, furthermore, M. 
Campanino, V. Mastropietro, H. Spohn whose invaluable comments made 
the book more readable than it would otherwise have been. 


Giovanni Gallavotti 


Roma, January 1999 
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§1.1. Introduction 


Statistical mechanics poses the problem of deducing macroscopic properties 
of matter from the atomic hypothesis. According to the hypothesis matter 
consists of atoms or molecules that move subject to the laws of classical 
mechanics or of quantum mechanics. 

Matter is therefore thought of as consisting of a very large number N 
of particles, essentially point masses, interacting via simple conservative 
forces.1 

A microscopic state is described by specifying, at a given instant, the value 
of positions and momenta (or, equivalently, velocities) of each of the N 
particles. Hence one has to specify 3N + 3N coordinates that determine a 
point in phase space, in the sense of mechanics. 

It does not seem that in the original viewpoint Boltzmann particles were 
really thought of as susceptible of assuming a 6N dimensional continuum 
of states, ([Bo74], p. 169): 


Therefore if we wish to get a picture of the continuum in words, we first 
have to imagine a large, but finite number of particles with certain properties 
and investigate the behavior of the ensemble of such particles. Certain prop- 
erties of the ensemble may approach a definite limit as we allow the number 
of particles ever more to increase and their size ever more to decrease. Of 
these properties one can then assert that they apply to a continuum, and 
in my opinion this is the only non-contradictory definition of a continuum 
with certain properties 


and likewise the phase space itself is really thought of as divided into a finite 
number of very small cells of essentially equal dimensions, each of which 
determines the position and momentum of each particle with a maximum 
precision. 

This should mean the maximum precision that the most perfect measure- 
ment apparatus can possibly provide. And a matter of principle arises: can 
we suppose that every lack of precision can be improved by improving the 
instruments we use? 

If we believe this possibility then phase space cells, representing microscopic 
states with maximal precision, must be points and they must be conceived 
of as a 6N dimensional continuum. But since atoms and molecules are not 
directly observable one is legitimized in his doubts about being allowed to 
assume perfect measurability of momentum and position coordinates. 

In fact in “recent” times the foundations of classical mechanics have been 


1 N = 6.02 x 1028 particles per mole = “Avogadro’s number”: this implies, for instance, 
that 1 cm? of Hydrogen, or of any other (perfect) gas, at normal conditions (1 atm at 
0°C) contains about 2.7 x 101° molecules. 


1.1.1 


1.1.2 


1.1.3 
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subject to intense critique and the indetermination principle postulates the 
theoretical impossibility of the simultaneous measurement of a component 
p of a particle momentum and of the corresponding component q of the 
position with respective precisions dp and ôq without the constraint: 


dpog > h (1.1.1) 


where h = 6.62 x 107?” erg - sec is the Planck’s constant. 

Without attempting a discussion of the conceptual problems that the above 
brief and superficial comments raise it is better to proceed by imagining that 
the microscopic states of a N particles system are represented by phase space 
cells consisting in the points of RSN with coordinates, (e.g. [Bo77]): 


ee — 5p/2 < Pa < p? + dp/2 


qa — 69/2 < da < qà + 69/2 ies aa 


if p1, p2, p3 are the momentum coordinates of the first particle, p4, ps5, pe 
of the second, etc, and qi, q2, q3 are the position coordinates of the first 
particle, q4, q5, qe of the second, etc... The coordinate pf, and q, are used 
to identify the center of the cell, hence the cell itself. 

The cell size will be supposed to be such that: 


ôpôq = h (1.1.3) 


where h is an a priori arbitrary constant, which it is convenient not to fix 
because it is interesting (for the reasons just given) to see how the theory 
depends upon it. Here the meaning of h is that of a limitation to the preci- 
sion that is assumed to be possible when measuring a pair of corresponding 
position and momentum coordinates. 

Therefore the space of the microscopic states is the collection of the cubic 
cells A, with volume h3™ into which we imagine that the phase space is 
divided. By assumption it has no meaning to pose the problem of attempting 
to determine the microscopic state with a greater precision. 

The optimistic viewpoint of orthodox statistical mechanics (which admits 
perfect simultaneous measurements of positions and momenta as possible) 
will be obtained by considering, in the more general theory with h > 0, the 
limit as h — 0, which will mean dp = Apo, ôq = Ado, with po, qo fixed and 
À — 0. 

Even if we wish to ignore (one should not!) the development of quan- 
tum mechanics, the real possibility of the situation in which h = 0 cannot 
be directly checked because of the practical impossibility of observing an 
individual atom with infinite precision (or just with “great” precision). 


81.2. Microscopic Dynamics 


The atomic hypothesis, apart from supposing the existence of atoms and 
molecules, assumes also that their motions are governed by a deterministic 
law of motion. 


1.2.1 


1.2.2 


1.2.3 
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This hypothesis can be imposed by thinking that there is a map S: 
SA = A (1.2.1) 


transforming the phase space cells into each other and describing the system 
dynamics. 

If at time t the state of the system is microscopically determined by the 
phase space cell A, then at a later time t + 7 it will be determined by the 
cell A’. Here 7 is a time step extremely small compared to the macroscopic 
time intervals over which the system evolution is followed by an observer: 
it is, nevertheless, a time interval directly accessible to observation, at least 
in principle. 

The evolution law S is not arbitrary: it must satisfy some fundamental 
properties; namely it must agree with the laws of mechanics in order to 
properly enact the deterministic principle which is basic to the atomic hy- 
pothesis. 

This means, in essence, that one can associate with each phase space cell 
three fundamental dynamical quantities: the kinetic energy, the potential 
energy and the total energy, respectively denoted by K(A), ®(A), E(A). 

For simplicity assume the system to consist of N identical particles with 
mass m, pairwise interacting via a conservative force with potential energy 
yp. If A is the phase space cell determined by (see (1.1.2)) (p°, g°), then the 
above basic quantities are defined respectively by: 1 


N 
K(p°)= K(A) = X (p°)?/2m P? = (P3425 P3i—1> P3i) 
i=1 
IN (1.2.2) 
D(q°) = OA) = X pe- q;) dE = (Gi-2 Gi-1> Gi) 


where p? = (Paso P3i—1> P3i) g = (Bio Bi- Bi) are the momentum 
and position of the i-th particle, à = 1,2,...N, in the microscopic state 
corresponding to the center (p°, q°) of A. 

Replacing p°,q°, i.e. the center of A, by another point (p,q) in À one 
obtains values K (p), ®(q), E(p, a) for the kinetic, potential and total energies 
different from K(A),®(A),E(A); however such a difference has to be non 
observable: otherwise the cells A would not be the smallest ones to be 
observable, as supposed above. 

If 7 is a fixed time interval and we consider the solutions of Hamilton’s 
equations of motion: 


GS, bas ee (1.2.3) 


q°) at time 0 the point (p°,q°) will evolve in time 7 
= (p',q') = (S7(p°,q°)). One then defines S so that 
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SA = A’ if A’ is the cell containing (p’, q’). The evolution (1.2.3) may send 
a few particles outside the volume V, cubic for simplicity, that we imagine to 
contain the particles: one has therefore to supplement (1.2.3) by boundary 
conditions that will tell us the physical nature of the walls of V. 

They could be reflecting, if the collisions with the walls are elastic, or 
periodic if the opposite faces of the region V are identified (a very convenient 
“mathematical fiction”, useful to test various models and to minimize the 
“finite size” effects, i.e. dependence of observations on system size). 

One cannot, however, escape some questions of principle on the structure 
of the map S that it is convenient not to ignore, although their deep under- 
standing may become a necessity only on a second reading. 

First we shall neglect the possibility that (p’, q’) is on the boundary of a cell 
(a case in which A’ is not uniquely determined, but which can be avoided 
by imagining that the cells walls are slightly deformed). 

More important, in fact crucial, is the question of whether SA, = SA2 
implies A, = Ag: the latter is a property which is certainly true in the 
case of point cells (h = 0), because of the uniqueness of the solutions of 
differential equations. It has an obvious intuitive meaning and an interest 
due to its relation with reversibility of motion. 

In the following analysis a key role is played by Liouville’s theorem which 
tells us that the transformation mapping a generic initial datum (p,q) into 
the configuration (p’, gq’) = S(p,q) is a volume preserving transformation. 
This means that the set of initial data (p, q) in A evolves in the time 7 into 
a set A with volume equal to that of A. Although having the same volume 
of A it will no longer have the same form of a square parallelepiped with 
dimensions dp or ôq. For h small it will be a rather small parallelepiped ob- 
tained from A via a linear transformation that expands in certain directions 
while contracting in others. 

It is also clear that in order that the representation of the microscopic 
states of the system be consistent it is necessary to impose some non trivial 
conditions on the time interval, so far unspecified, that elapses between 
successive (thought) observations of the motions. Such conditions can be 
understood via the following reasoning. 

Suppose that h is very small (actually by this we mean, here and below, 
that both dp and ôq are small) so that the region A can be regarded as 
obtained by translating A and possibly by deforming it via a linear dilata- 
tion in some directions or a linear contraction in others (contraction and 
dilatation balance each other because, as remarked, the volume remains 
constant). This is easily realized if h is small enough since the solutions to 
ordinary differential equations can always be thought of, locally, as linear 
transformations close to the identity, for small evolution times 7. Then: 


i) If S dilates and contracts in various directions, even by a small amount, 
there must necessarily exist pairs of distinct cells Ay Æ Ag for which SA, = 
SAg: an example is provided by the map of the plane transforming (x, y) 
into S(x,y) = ((1 +¢)~12, (1+ .)y), € > 0 and its action on the lattice of 


1.2.5 


1.2.6 
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the integers. Assuming that one decides that the cell A’ into which a given 
cell A evolves is, among those which intersect its image SA, the one which 
had the largest intersection with it, then indetermination arises for a set of 
cells spaced by about £71. 


It is therefore necessary that 7 be small, say: 
T< ÜL (1.2.4) 


with 0, such that the map S (associated with S,, see (1.2.2), hence close 
to the identity) produces contractions and expansions of A that can be 
neglected for the large majority of the cells A. Only in this way will it be 
possible that SA; = SA with A, Æ Ag for just a small fraction of the cells 
and, hence, one can hope that this possibility is negligible. 

It should be remarked explicitly that the above point of view is systemat- 
ically taken by physicists performing numerical experiments. Phase space 
is represented in computers as a finite, but very large, set of points whose 
positions are changed by the time evolution (how many depends on the 
precision of the representation of the reals). Even if the system studied is 
modeled by a nice differential equation with global uniqueness and existence 
of solutions, the computer program, while trying to generate a permutation 
of phase space points, will commit errors, i.e. two distinct points will be 
sent to the same point (we do not talk here of round-off errors, which are 
not really errors as they are a priori known, in principle): one thus hopes 
that such errors are rare enough to be negligible. This seems inevitable 
except in some remarkable cases, the only nontrivial one I know of being in 
[LV93]. 


ii) But T cannot be too small either, if one wishes to maintain coherently 
the point of view that microscopic states are described by phase space cells. 
In fact to a cell A is associated a natural time scale Ÿ_(A): which can be 
defined as the minimum time in order that A becomes distinguishable from 
the cell into which it evolves in time V- (A). And 7 must be necessarily 
larger than the latter minimum time scale: 


ÿ_(A)<r (1.2.5) 


(otherwise we have Zeno’s paradox and nothing moves). 


Summarizing we can say that in order to be able to define the dynamic 
evolution as a map permuting the phase space cells it must be that 7 be 
chosen so that: 


Ÿ_ — max") (A) <7 < v4 (1.2.6) 


where the quotes mean that the maximum has to be taken as A varies 
within the “majority” of the cells, where one can suppose that A; Æ A» 
implies SA, Æ SA. 


1.2.7 
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One should realize that if y is a “reasonable” molecular potential (a typical 
model for y is, for instance, the Lennard-Jones potential with intensity € 
and range ro given by: y(r) = 4e((“2)'* — (42)°)) it will generically be that: 


E 


lim v- =0 (1.2.7) 


while for small À the right hand side of (1.2.6) (which has a purely kine- 
matical nature) becomes h independent. 

Hence it will be possible, at least in the limit in which h tends to 0, to define 
a T so that (1.2.4), (1.2.5) hold; i.e. it will be possible to fulfill the above 
consistency criteria for the describing microscopic states of the system via 
finite cells. 

On the other hand if h > 0, and a posteriori one should think that h = 
6.62 x 107?" erg sec, the question we are discussing becomes quite delicate: 
were it not because we do not know what we should understand when we 
say the “large majority” of the phase space cells. 

In fact, on the basis of the results of the theory it will become possible to 
evaluate the influence on the results themselves of the existence of pairs of 
cells Ay Æ Ao with SA; = SAg. 

Logically at this point the analysis of the question should be postponed 
until the consequences of the hypotheses that we are assuming allow us to 
reexamine it. It is nevertheless useful, in order to better grasp the delicate 
nature of the problem and the orders of magnitude involved, to anticipate 
some of the basic results and to provide estimates of J’: readers preferring 
to think in purely classical terms, by imagining that h = 0 on the basis of 
a dogmatic interpretation of the (classical) atomic hypothesis, can skip the 
discussion and proceed by systematically taking the limit as h — 0 of the 
theory that follows. 

It is however worth stressing that setting h = 0 is an illusory simplification 
avoiding posing a problem that is today well known to be deep. Assuming 
that, at least in principle, it should be possible to measure exactly positions 
and momenta of a very large number of molecules (or even of a single one) 
means supposing it is possible to perform a physical operation that no one 
would be able to perform. It was the obvious difficulty, one should recall, of 
such an operation that in the last century made it hard for some to accept 
the atomic hypothesis. 

Coming to the problem of providing an idea of the orders of magnitude of 
V+ one can interpret “max” in (1.2.6) as evaluated by considering as typical 
cells those for which the momenta and the reciprocal distances of the parti- 
cles take values “close” to their “average values”. The theory of statistical 
ensembles (see below) will lead to a natural probability distribution giving 
the probability of each cell in phase space, when the system is in macro- 
scopic equilibrium. Therefore we shall be able to compute, by using this 
probability distribution, the average values of various quantities in terms of 
macroscopic quantities like the absolute temperature T, the particle mass 
m, the particle number N, and the volume V available to the system. 


1.2.8 


1.2.9 


1.2.10 
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The main property of the probability distributions of the microscopic states 
observed in a situation in which the macroscopic state of the system is in 
equilibrium is that the average velocity and average momentum 7, D will be 
related to the temperature by: 


B= mo = y3mkgpT, md" = 3kgT (1.2.8) 


where kp = 1.38 x 10716 erg °K—" is the universal Boltzmann’s constant. 

Other relevant quantities are the characteristic parameters of the interac- 
tion, i.e. the strength € with the dimension of energy and the range ro with 
the dimension of length. It follows from the developments of the theory 
of equilibrium statistical mechanics, independently of the particular form of 
(r) (as long as it is “reasonable”, like for instance the above mentioned 
Lennard-Jones potential), that € © kgTŸ where T? is the critical liquefac- 
tion temperature and ro is of the order of the molecular diameter (between 
2 x 1078 cm and 4 x 1078 cm in the simplest gases like H2, He, Os, CO», see 
Chap.V). 

We estimate 0+ first (the time scale over which expansion and contraction 
of a phase space cell become sensible) looking at a typical cell where one 
can assume that the particles evolve in time without undergoing multiple 
collisions. In such a situation the relative variation of a linear dimension of 
A in the time 7 will be, for small 7, proportional to 7 and it may depend on 
€,m,1o,v: the pure numbers related to 7 and to the phase space dilatations 
(i.e. to the derivatives of the forces appearing in the equations of motion) 
that one can form with the above quantities are ee ae and (23). 
Hence the phase space changes in volume will be negligible, recalling that 
mT? = 3kpT, from (1.2.8), and setting € = kgT?, provided: 


r< min ( (E), (i) =v. (1.2.9) 


The condition 7 \/e/mrg < 1 means that, even during a microscopic collision 
taking place while the time 7 elapses, there is no sensible expansion while 
the second condition T < ro/ U means that the time 7 is short with respect 
to the total collision duration (which therefore takes several units of 7 to 
be completed). 

To estimate J_ (the time scale over which a cell evolves enough to be 
distinguishable from itself) note that, given A, the coordinates pa, qa of the 
phase space points in the cell A change obeying the Hamiltonian equations 
of motion, in the time 7, by: 


OE SPE 
gal = Ir (e.g) = 7% 

p Gs (1.2.10) 
pbt aua = 
Pal — "Og ; 5q 
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1.2.13 


1.2.14 


1.2.15 


1.2.16 


1.2.17 


10 I. Classical Statistical Mechanics 


where 640 E A 62) E are the variations of the energy E in the cell A when the 
coordinates pq or, respectively, qa vary by the amount dp or ôq, i.e. they 
vary by a quantity equaling the linear dimensions of the cell A, while the 
others stay constant (so that the variations in (1.2.10) are related to the 
partial derivatives of the energy function E). 
Defining therefore the energy indetermination, which we denote by 6E(A), 
in the cell A as: 
GE(A) = max (6 E, 62) E) (1.2.11) 


we see that the minimum time )_(A) that one has to wait in order to see 
that the cell evolves into a cell which is distinguishable from A itself is: 


0_(A) max, BE > ôq, or Ÿ_(A)max, BE > dp (1.2.12) 


in fact, 6p and dq being the linear dimensions of A, Eq. (1.2.12) just says 
that at least one of the sides of A has moved away by a quantity of the 
order of its length (thus becoming distinguishable from itself). 

Since we set dp dq = h one deduces from (1.2.11),(1.2.12): 


0_(A)dE(A) >h (1.2.13) 
and Ÿ_ = ôt can be chosen so that, introducing the notation: 
ôE = "minh OE (A) (1.2.14) 
we have: 
otok =h. (1.2.15) 


We can therefore see, on the basis of (1.2.9),(1.2.15), whether or not an 
interval (V_, 04) admissible for 7 exists. We can in fact imagine that dp © P, 
hence 6E = Ddop/m © p*/m = 3kpT and: 


0_ =h/kpT. (1.2.16) 


Equation (1.2.16) gives a remarkable interpretation of the time scale h/kgT: 
it is the time necessary so that a phase space cell, typical among those 
describing the microscopic equilibrium states at temperature T, becomes 
distinguishable from itself. 

One can say, differently, that V— is determined by the size of p,q, i.e. by 
the size of the first derivatives of the Hamiltonian, while 7+ is related to the 
phase space expansion, i.e. to the second derivatives of the Hamiltonian. 

With some algebra one derives, from (1.2.9), (1.2.16): 


04/0_ = (mr2kgT?/h?)¥/? min(T/T®, (T/TS)/?). (1.2.17) 


Therefore it is clear that the relation 0,/0_ > 1, necessary for a consistent 
description of the microscopic states in terms of phase space cells, will be 


1.2.18 
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satisfied for large T, say T >> To, but not for small T (unless one takes 

h = 0). And from the expression just derived for the ratio J; /0_ one gets 
v/v- >1if T > Tp with: 

o h h? 

To /T? = max (a C7) (1.2.18) 

Table 1.1 below gives an idea of the orders of magnitude: it is elaborated 
having chosen h = 6.62 x 1072” erg sec, i.e. Planck’s constant. A similar 
table would be derived if, ignoring Planck’s constant suggested as a natural 
action unit from quantum mechanics, we took dp ~ 5, dq ~ */V/N, as 
Boltzmann himself did when performing various conceptual calculations, 
[Bo96], [Bo97], in his attempts to explain why his physical theory did not 
contradict mathematical logic. 

In fact with the latter choice the action unit dpdg = D*/V/N would be, 
in “reasonable cases” (1 cm? of hydrogen, m = 3.34 x 10729, T = 273°K, 
N = 2.7x 109, kg = 1.38x 10716 erg? K-t), of the same order of magnitude 
as Planck’s constant, namely it would be dp ôq & 2.04 x 107? erg : sec. 

The corresponding order of magnitude of J_ is J_ © 5.4 1071? sec 

The sizes of the estimates for To/T° in the table show that the question 
of logical consistency of the microscopic states representation in terms of 
phase space cells permuted by the dynamics, if taken literally, depends in 
a very sensitive way on the value of h and, in any event, it is doomed to 
inconsistency if T — 0 and € # 0 (hence V- — +00 and 04 — \/mrg/e < 
+00). 


Table 1.1: Orders of magnitude (N4 denotes Avogadro’s number) 


The columns A, B give empirical data, directly accessible from experiments and expressed 
in cgs units (i.e. A in erg: cm? and B in cm), of the van der Waals’ equation of state. 
If n = N/NA = number of moles, R = kBNA, see 85.1 for (*), (xx) below, then the 
equation of state is: 

(P + An? /V2)(V —nB) = nRT (+) 


which is supposed here in order to derive values for €,ro via the relations: 


4 3 32 
(B/Na) = at (2) =4vo A/N2 = Evo (xx) 


which lead to the expressions (see §4.3) ro = (3B/2rN4)l/3,e = 3A/8BNA = 
81 kpT? /64; Ti"“* = experimental value of the critical temperature ~ To. 
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In this book I will choose the attitude of not attempting to discuss which 
would be the structure of a statistical mechanics theory of phenomena 
below Tọ if a strict, “axiomatic”, classical viewpoint was taken assuming 
dp = 0,6q = 0: the theory would be extremely complicated as discovered 
in the famous simulation [FPU55] and it is still not well understood even 
though it is full of very interesting phenomena, see [GS72], [Be94], [Be97] 
and Chap.III, §3.2. 


81.3. Time Averages and the Ergodic Hypothesis 


We are led, therefore, to describe a mechanical system of N identical mass 
m particles (at least at not too low temperatures, T > To, see (1.2.18)) 
in terms of (a) an energy function (“Hamiltonian”) defined on the 6N- 
dimensional phase space and (b) a subdivision of such a space into cells 
A of equal volume h3™, whose size is related to the highest precision with 
which we presume to be able to measure positions and momenta or times 
and energies. 

Time evolution is studied on time intervals multiples of a unit 7: large 
compared to the time scale ôt associated with the cell decomposition of 
phase space by (1.2.15), (1.2.16) and small compared with the collision 
time scale (1.2.9): see [Bo74], p. 44, 227. In this situation time evolution 
can be regarded as a permutation of the cells with given energy: we neglect, 
in fact, on the basis of the analysis in §1.2 the possibility that there may be 
a small fraction of different cells evolving into the same cell. 

In this context we ask what will be the qualitative behavior of the system 
with an energy “fixed” macroscopically, i.e. in an interval between E — DE 
and E, if its observations are timed at intervals 7 and the quantity DE is 
macroscopically small but DE > E = h/ôt; see (1.2.15), (1.2.16). 

Boltzmann assumed, very boldly, that in the interesting cases the ergodic 
hypothesis held, according to which ([Bo71], [Bo84], [Ma79]): 


Ergodic hypothesis: the action of the evolution transformation S, as a cell 
permutation of the phase space cells on the surface of constant energy, is a 
one cycle permutation of the N phase space cells with the given energy: 


SAY Wad  k=1,2,..., N (1.3.1) 
if the cells are suitably enumerated (and Ay41 = A). 


In other words as time evolves every cell evolves, visiting successively all 
other cells with equal energy. The action of S is the simplest thinkable 
permutation! 

Even if not strictly true this should hold at least for the purpose of com- 
puting the time averages of the observables relevant for the macroscopic 
properties of the system. 

The basis for such a celebrated (and much criticized) hypothesis rests on 
its conceptual simplicity: it says that in the system under analysis all cells 
with the same energy are equivalent. 
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There are cases (already well known to Boltzmann, [Bo84]) in which the 
hypothesis is manifestly false: for instance if the system is enclosed in a 
perfect spherical container then the evolution keeps the angular momentum 
M(A) = ni g? Ap? constant. Hence cells with a different total angular 
momentum cannot evolve into each other. 

This extends to the cases in which the system admits other constants of mo- 
tion, besides the energy, because the evolving cells must keep the constants 
of motion equal to their initial values. And this means that the existence of 
other constants of motion besides the energy is, essentially, the most general 
case in which the ergodic hypothesis fails: in fact when the evolution is not 
a single cycle permutation of the phase space cells with given energy, then 
one can decompose it into cycles. One can correspondingly define a function 
A by associating with each cell of the same cycle the very same (arbitrarily 
chosen) value of A, different from that of cells of any other cycle. 

Obviously the function A so defined is a constant of motion that can play 
the same role as the angular momentum in the previous example. 

Thus, if the ergodic hypothesis failed to be verified, then the system would 
be subject to other conservation laws, besides that of the energy. In such 
cases it would be natural to imagine that all the conserved quantities were 
fixed and to ask oneself which are the qualitative properties of the motions 
with energy E, when all the other constants of motion are also fixed. Clearly 
in this situation the motion will be by construction a simple cyclic permuta- 
tion of all the cells compatible with the prefixed energy and other constants 
of motion values. 

Hence it is convenient to define formally the notion of ergodic probability 
distribution on phase space: 


Definition: a set of phase space cells is ergodic if S maps it into itself and 
if S acting on the set of cells is a one-cycle permutation of them. 


Therefore, in some sense, the ergodic hypothesis would not be restrictive 
and it would simply become the statement that one studies the motion 
after having a priori fixed all the values of the constants of motion. 

The latter remark, as Boltzmann himself realized, does not make less inter- 
esting the concrete question of determining whether a system is ergodic in 
the strict sense of the ergodic hypothesis (i.e. no other constants of motion 
besides the energy). On the contrary it serves well to put in evidence some 
subtle and deep aspects of the problem. 

In fact the decomposition of S into cycles (ergodic decomposition of S) 
might turn out to be so involved and intricated to render its construction 
practically impossible, z.e. useless for practical purposes. This would hap- 
pen if the regions of phase space corresponding to the various cycles were 
(at least in some directions) of microscopic size or of size much smaller than 
a macroscopic size, or if they were very irregular on a microscopic scale: a 
quite different a situation if compared to the above simple example of the 
conservation of angular momentum. 


1.3.2 


1.3.3 
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It is not at all inconceivable that in interesting systems there could be 
very complicated constants of motion, without a direct macroscopic physical 
meaning: important examples are discussed in [Za89]. 

Therefore the ergodic problem, i.e. the problem of verifying the validity 
of the ergodic hypothesis for specific systems, in cases in which no partic- 
ular symmetry properties can be invoked to imply the existence of other 
constants of motion, is a problem that remains to be understood on a case- 
by-case analysis. A satisfactory solution would be the proof of strict validity 
of the ergodic hypothesis or the possibility of identifying the cycles of S via 
level surfaces of simple functions admitting a macroscopic physical meaning 
(e.g. simple constants of motion associated with macroscopic “conservation 
laws”, as in the case of the angular momentum illustrated above). 

It is useful to stress that one should not think that there are no other simple 
and interesting cases in which the ergodic hypothesis is manifestly false. The 
most classical example is the chain of harmonic oscillators: described by: 


N 


N 
T=) p/m =) mn-a)/2 (132) 


i=1 


where, for simplicity, gw+1 = qı (periodic boundary condition). 
In this case there exist a large number of constants of motion, namely N: 


Ax = (p:n, +w(k)(g.n,) k=1,2,...,N (1.3.3) 
where n., 15;---+2, are N suitable orthonormal vectors (normal modes) 
and w(k) are the “intrinsic pulsations” of the chain: 


w(k)? = 2(1 — cos 2rk/N) . (1.3.4) 


The constants of motion in (1.3.3) can be arranged into an N-vector A(A) = 

(Ai (A), A&2(A),..., An(A)). The phase space cells A and A’ for which the 
vectors A(A) and A(A’) do not coincide cannot belong to the same cycle 
so that the system is not ergodic. 

Nevertheless Boltzmann thought that circumstances like this should be 
considered exceptional. Hence it will be convenient not to go immediately 
into a deeper analysis of the ergodic problem: not only because of its diffi- 
culty but mainly because it is more urgent to see how one can proceed in 
the foundations of classical statistical mechanics. 

Given a mechanical system of N identical (for the sake of simplicity) par- 
ticles consider the problem of studying a fixed observable f(p, q) defined on 
phase space. TT 

The first important quantity that one can study, and often the only one 
that it is necessary to study, is the average value of f: 


FU) = im = So AA) (1.3.5) 
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where f(A) = f(p,q) if (p,q) is a point determining the cell A. If Ay = 
A, Ao,..., Aw is the cycle to which the cell A belongs, then: 


— 1 N 
F(A) rA) (1.3.6) 
k=1 


and in the ergodic case the cycle consists of the set of all cells with the same 
energy as A. 

If the system energy is determined up to a macroscopic error DE, macro- 
scopically negligible (but large with respect to the microscopic indetermi- 
nation of energy ôE, (1.2.14)), the cells with energy between E — DE and 
E will be divided into cycles with (slightly) different energies. On each of 
the cycles the function f can be supposed to have the “continuity” property 
of having the same average value (i.e. energy independent up to negligible 
variations). 

Hence, denoting by the symbol J£ the domain of the variables (p,q) where 
E — DE < E(p,q) < E holds, one finds: en 


-= f f(p, q) dp a/ f dp dq. (1.3.7) 


Recalling in fact that the cells all have the same volume, (1.3.7) follows im- 
mediately from (1.3.6) and from the assumed negligibility of the dependence 
of f(A) from E(A), provided h is so small that the sum over the cells can 
be replaced by an integral. 

The above relation, which Boltzmann conjectured ([Bo71b], [Bo84]) to be 
always valid “discarding exceptional cases” (like the harmonic oscillator 
chain just described) and wrote in the suggestive form, [Bo71b], and p. 25 
in [EE11]: 


lim a (1.3.8) 


is read “the time average of an observable equals its average on the surface 
of constant energy”. As we shall see, (81.6), (1.3.8) provides a heuristic 
basis of the microcanonical model for classical thermodynamics. 

Note that if (1.3.8) holds, i.e. if (1.3.7) holds, the average value of an 
observable will depend only upon E and not on the particular phase space 
cell A in which the system is found initially. The latter property is certainly 
a prerequisite that any theory aiming at deducing macroscopic properties 
of matter from the atomic hypothesis must possess. 

It is, in fact, obvious that such properties cannot depend on the detailed 
microscopic properties of the configuration A in which the system happens 
to be at the initial time of our observations. 

It is also relevant to note that in (1.3.6) the microscopic dynamics has 
disappeared: it is in fact implicit in the phase space cell enumeration, made 
so that A1, A2, A3,... are the cells into which A successively evolves at time 
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intervals 7. But it is clear that in (1.3.6) the order of such enumeration is 
not important and the same result would follow if the phase space cells with 
the same energy were enumerated differently. 

Hence we can appreciate the fascination that the ergodic hypothesis exer- 
cises in apparently freeing us from the necessity of knowing the details of 
the microscopic dynamics, at least for the purposes of computing the ob- 
servables averages. That this turns out to be an illusion, already clear to 
Boltzmann, see for instance p. 206 in [Bo74], will emerge from the analysis 
carried out in the following sections. 


81.4. Recurrence Times and Macroscopic Observables 


In applications it has always been of great importance to be able to estimate 
the rapidity at which the limit f is reached: in order that (1.3.7) be useful 
it is necessary that the limit in (1.3.5) be attained within a time interval 
t which might be long compared to the microscopic 7 but which should 
still be very short compared to the time intervals relevant for macroscopic 
observations that one wants to make on the system. It is, in fact, only on 
scales of the order of the macroscopic times t that the observable f may 
appear as constant and equal to its average value. 

It is perfectly possible to conceive of a situation in which the system is er- 
godic, but the value f(S*A) is ever changing, along the trajectories, so that 
the average value of f is reached on time scales of the order of magnitude of 
the time necessary to visit the entire surface of constant energy. The latter 
is necessarily enormous. 

For instance, referring to the orders of magnitude discussed at the end 
of 81.2, see the values of dp,dF preceding (1.2.16) and (1.2.16) itself, we 
can estimate this time by computing the number of cells with volume AN 
contained in the region between E and E + ôE and then multiplying the 
result by the characteristic time h/kgT in (1.2.16), [Bo96], [Bo97]. 

If the surface of the d-dimensional unit sphere is written 2/7 T(d/ a 
(with T Euler’s Gamma function) then the volume of the mentioned region, 
if h is very small, can be computed by using polar coordinates in momentum 


space. The cells are those such that P = 4/ £P? varies between P = V2mE 


and P+ ôP = \/2m(E + 6E); hence we introduce, see 81.2, (1.2.14), the 
quantities: 


P =V2mE, dp = P = y 3mkpgT 


T E 3kpT 

ôE =3kgT = pôp/m, NT 9 (1.4.1) 
_ pop P of V8 

SR Re 6a = (=) 


where kg is Boltzmann’s constant, kg = 1.38 x 10716 erg? K—', T is the 
absolute temperature, V is the volume occupied by the gas and N is the 
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particle number. One finds that the volume we are trying to estimate is, 
setting h = ôpôq and using Stirling’s formula to evaluate I'( À): 


w = VV V2mE  6P V2 (3N/2) 1 = 
= (Nô) (V Nôp) N- 18PY/T 2T(3N/2) 1 = (1.4.2) 
= (pòg) N NÈN 1/7 2T(3N/2) 1 & 
N2N-1 27e, 3N/2 2 1 (2me\3N/2 
~ p3N sme 4) ~ p3N apN—-4 (ATE 
Sag) pa Ne) 


3 
The number of cells M would then be w/h?%. But we shall assume, on 
the grounds of particle indistinguishability, that cells differing because in- 
dividual particles are permuted are in fact identical. Then (1.4.2) has to 
be divided by N!¥ NNe~N V/2xN and therefore the recurrence time, if the 
system did move ergodically on the surface of energy E, would be: 


(1.4.3) 


5/3\ 3N/2 
Trecurrence = Nr Nq & N (= ) 
As discussed in §1.2 the order of magnitude of r = h/kBT is, if T = 300°K, 
of about 10714 sec. For our present purposes it makes no difference whether 
we use the expression À = 6pôq with dp, ôq given in (1.4.1) with V = 1 cm, 
N = 2.7 x 101%, m = 3.34 x 10724 g = hydrogen molecule mass, or whether 
we use Planck’s constant (see comment after (1.2.18)). 

Hence, 2n ef * being > 10, the recurrence time in (1.4.3) is unimaginably 
longer than the age of the Universe as soon as N reaches a few decades 
(still very small compared to Avogadro’s number). If T is chosen to be 0°C: 
for lem? of hydrogen at 0°C, 1 atm one has N ~ 10!° and Trecurrence = 
10-14.1010 sec, while the age of the Universe is only ~ 1017 sec! 

Boltzmann’s idea to reconcile ergodicity with the observed rapidity of the 
approach to equilibrium was that the interesting observables, the macro- 
scopic observables, had an essentially constant value on the surface of given 
energy with the exception of an extremely small fraction € of the cells, [Bo74], 
p. 206. See §1.7 below for further comments. 

Hence the time necessary to attain the asymptotic average value will not 
be of the order of magnitude of the hyperastronomic recurrence time, but 
rather of the order of T’ = €Tyeeurrence. And one should think that £ — 0 as 
the number of particles grows and that T” is very many orders of magnitude 
smaller than T so that it becomes observable on “human” time scales, see 
81.8 for a quantitative discussion (actually T” sets, essentially by definition, 
the size of the human time scale). 

Examples of important macroscopic observables are: 


e(1) the ratio between the number of particles located in a small cube Q 
and the volume of Q: this is an observable that will be denoted p(Q) and 
its average value has the interpretation of density in Q; 


1.4.4 
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e(2) the sum of the kinetic energies of the particles: K(A) = Dip? /2m; 
e(3) the total potential energy of the system: ®(q) = ici PG, — q,); 


e(4) the number of particles in a small cube Q adherent to the container 
walls, and having a negative component of velocity along the inner normal 
with value in [—v,—(v + dv)], v > 0. This number divided by the volume 
of Q is the “density” n(Q,v)dv of particles with normal velocity —v that 
are about to collide with the external walls of Q. Such particles will cede a 
momentum 2mv normally to the wall at the moment of their collision (as 
their momentum will change from —mv to mv) with the wall. Consider the 
observable defined by the sum over the values of v and over the cubes Q 
adjacent to the boundary of the container V: 


D | dem Qeyamoys = PA (1.4.4) 


with s = area of a face of Q and S = area of container surface: this is the 
momentum transferred, per unit time and surface area, to the wall (note 
that the number of collisions with the wall per unit time on the face s of 
Q adjacent to the walls and with normal velocity v is n(Q,v)vsdv). The 
quantity (1.4.4) is an observable (i.e. a function on phase space) whose 
average value has the interpretation of macroscopic pressure, therefore it 
can be called the “microscopic pressure” in the phase space point (p, q). 


e(5) the product p(Q)p(Q’) is also interesting and its average value is called 
the density pair correlation function between the cubes Q, Q’. Its average 
value provides information on the joint probability of finding simultaneously 
a particle in Q and one in Q’. 


81.5. Statistical Ensembles or “Monodes” and Models of Ther- 
modynamics. Thermodynamics without Dynamics 


From a more general viewpoint and without assuming the ergodic hypoth- 
esis it is clear that the average value of an observable will always exist and 
it will be equal to its average over the cycle containing the initial datum, 
see (1.3.6). 

For a more quantitative formulation of this remark we introduce the notion 
of stationary distribution: it is a function associating with each phase space 
cell a number (A) (probability or measure of A) so that: 


H(A)>0 Siu(A)=1  n(A)= (SA) (1.5.1) 
A 


if S is the time evolution map which permutes the cells, see §1.2 and §1.3. 

One usually says that u is an invariant probability distribution or a sta- 
tionary probability distribution on phase space (or, better, on phase space 
cells). The following definition will be convenient (see §1.3): 


1.5.2 


1.5.3 
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Definition: Let u be an invariant probability distribution on phase space 
cells. If the dynamics map S acts as a one-cycle permutation of the set of 
cells A for which (A) > 0 then n is called ergodic. 


If one imagines covering phase space with a fluid so that the fluid mass 
in A is u(A) and if the phase space point are moved by the permutation 
S associated with the dynamics then the fluid looks immobile, i.e. its dis- 
tribution on phase space remains invariant (or stationary) as time goes by: 
this gives motivation for the name used for y. 

It is clear that (A) must have the same value on all cells belonging to 
the same cycle Ca of the permutation S (here a is a label distinguishing 
the various cycles of S). If M (Ca) is the number of cells in the cycle Ca it 
must, therefore, be that (A) = pa/N (Ca), with pa > 0 and >, pa = 1, 
for A E€ Ca. 

It is useful to define, for each cycle Ca of S, a (ergodic) stationary distri- 
bution Ha by setting: 


pta(A) = { HN Ca) RACE (1.5.2) 
0 otherwise 

and this allows us to think that any invariant probability distribution is a 

linear combination of the pa associated with the various cycles of S: 


H(A) =>. palia(A), (1.5.3) 


where pa > 0 are suitable coefficients with ` pa = 1, which can be called 
the “probabilities of the cycles” in the distribution u. Note that, by def- 
inition, each of the distributions 4, is ergodic because it gives a positive 
probability only to cells that are part of the same cycle (namely Ca). 

The decomposition (1.5.3) of the most general S-invariant distribution u as 
asum of S-ergodic distributions is naturally called the ergodic decomposition 
of u (with respect to the dynamics S). 

In the deep paper [Bo84] Boltzmann formulated the hypothesis that sta- 
tionary distributions u could be interpreted as macroscopic equilibrium 
states so that the set of macroscopic equilibrium states could be identi- 
fied with a subset € of the stationary distributions on phase space cells. 
The current terminology refers to this concept as an ensemble, after Gibbs: 
while Boltzmann used the word monode. We shall call it an ensemble or a 
statistical ensemble. 

Identification between an individual stationary probability distribution py 
on phase space and a corresponding macroscopic equilibrium state takes 
place by identifying u(A) with the probability of finding the system in the 
cell (i.e. in the microscopic state) A if one performed, at a randomly chosen 
time, the observation of the microscopic state. 

Therefore the average value in time, in the macroscopic equilibrium state 
described by u, of a generic observable f would be: 


F= So MAA). (1.5.4) 
A 
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This relation correctly gives, in principle, the average value of f in time, if 
the initial data are chosen randomly with a distribution u which is ergodic. 
But in general even if u is ergodic one should not think that (1.5.4) is 
directly related to the physical properties of u. This was already becoming 
clear in §1.3 and §1.4 when we referred to the length of the recurrence times 
and hence to the necessity of further assumptions to derive (1.3.7), (1.3.8). 
We shall come back to (1.5.4) and to the ergodic hypothesis in §1.6. Return- 
ing to Boltzmann’s statistical ensembles he raised the following questionin 
in the paper [Bo84]: letting aside the ergodic hypothesis or any other at- 
tempt at a dynamical justification of (1.5.4), consider all possible statistical 
ensembles € of stationary distributions on phase space. Fix € and, for each 
H € €, define: 


(u) = 5 p(A)®(A) = “average potential energy”, 
A 
T(u) = 5 uA)K(A) = “average kinetic energy”, 
A 
U(u) = T(u) + (y) = “average total energy”, (1.5.5) 
P(y) =X MAPA) = “pressure”, see (1.4.4), 
A 
p{u) = N/V =p=i/v- “density”, 
V= I dq = “volume” , 


where V is the volume assigned to the system (i.e. the volume of the con- 
tainer) and N is the particle number. 


Question (“orthodicity problem”): which statistical ensembles, or monodes, 
E have the property that as u changes infinitesimally within E the corre- 
sponding infinitesimal variations dU, dV of U = U (p) and V, see (1.5.5), 
are related to the pressure p = P(u) and to the average kinetic energy per 
particle T = T()/N) so that: 


dU + P dV 


7 = exact differential (1.5.6) 


at least in the thermodynamic limit in which the volume V — co and also 
N,U — œ so that the densities N/V, U/V remain constant (assuming for 
simplicity that the container keeps cubic shape). 


Ensembles (or monodes) satisfying the property (1.5.6) were called by 
Boltzmann orthodes: they are, in other words, the statistical ensembles 
E in which it is possible to interpret the average kinetic energy per particle, 
T, as proportional to the absolute temperature T (via a proportionality con- 
stant, to be determined empirically and conventionally denoted (2/3kg) ": 


so that T = a); and furthermore it is possible to define via (1.5.6) a 


1.5.7 
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function S(u) on € so that the observables U, p, T, V, P, S satisfy the rela- 
tions that the classical thermodynamics quantities with the corresponding 
name satisfy, at least in the thermodynamic limit. 

In this identification the function S(u) would become, naturally, the en- 
tropy and the validity of (1.5.6) would be called the second law. 

In other words Boltzmann posed the question of when it would be possible 
to interpret the elements y of a statistical ensemble € of stationary distri- 
butions on phase space as macroscopic states of a system governed by the 
laws of classical thermodynamics. 

The ergodic hypothesis combined with the other assumptions used in §1.3 
to deduce (1.3.7), (1.3.8) leads us to think that the statistical ensemble € 
consisting of the distributions u on phase space defined by: 


(A) = 1/N (U,V) if E(A) € (U-DE,U) 157 

(A) = 0 otherwise eo) 
where U,V are prefixed parameters corresponding to the total energy and 
volume of the system, should necessarily be a statistical ensemble apt at 
describing the macroscopic equilibrium states. Here (U,V) is a normal- 
ization constant to be identified as proportional to the integral f dpdq over 
the region Jp of p, q in which E(p,q) € (U — DE,U); and the parameter 
DE is “arbitrary” as discussed before (1.3.7). 

However the orthodicity or nonorthodicity of a statistical ensemble E whose 
elements are parameterized by U,V as in (1.5.7) is “only” the question of 
whether (1.5.6) (second law) holds or not and this problem is not, in itself, 
logically or mathematically related to any microscopic dynamics property. 

The relation between orthodicity of a statistical ensemble and the hy- 
potheses on microscopic dynamics (like the ergodic hypothesis) that would 
a priori guarantee the physical validity of the ensuing model of thermody- 
namics will be reexamined in more detail at the end of §1.6. 

If there were several orthodic statistical ensembles then each of them would 
provide us with a mechanical microscopic model of thermodynamics: of 
course if there were several possible models of thermodynamics (i.e. several 
orthodic statistical ensembles) it should also happen that they give equiva- 
lent descriptions, i.e. that they give the same expression to the entropy S as 
a function of the other thermodynamic quantities, so that thermodynamics 
would be described in mechanical terms in a nonambiguous way. This check 
is therefore one of the main tasks of the statistical ensembles theory. 

It appears that in attempting to abandon the (hard) fundamental aim at 
founding thermodynamics on microscopic dynamics one shall nevertheless 
not avoid having to attack difficult questions like that of the nonambiguity 
of the thermodynamics that corresponds to a given system. The latter is a 
problem that has been studied and solved in various important cases, but we 
are far from being sure that such cases (the microcanonical or the canonical 
or the grand canonical ensembles to be discussed below, and others) exhaust 
all possible ones. Hence a “complete” understanding of this question could 
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reveal itself equivalent to the dynamical foundation of thermodynamics: the 
very problem that one is hoping to circumvent by deciding to “only” build 
a mechanical model of thermodynamics, i.e. an orthodic ensemble. 


81.6. Models of Thermodynamics. Microcanonical & Canonical 
Ensembles and the Ergodic Hypothesis. 


The problem of the existence of statistical ensembles (i.e. a family of sta- 
tionary probability distributions on phase space) that provides mechanical 
models of thermodynamics? was solved by Boltzmann in the same paper 
quoted above, [Bo84] (following earlier basic papers on the canonical en- 
semble [Bo71la], [Bo71b] where the notion of ensemble seems to appear for 
the first time). 

Here Boltzmann showed that the statistical ensembles described below and 
called, after Gibbs, the microcanonical and the canonical ensemble are or- 
thodic, i.e. they define a microscopic model of thermodynamics in which the 
average kinetic energy per particle is proportional to absolute temperature 
(see below and §1.5). 


(1) The microcanonical ensemble 


It was named in this way by Gibbs while Boltzmann referred to it by the 
still famous, but never used, name of ergode. The microcanonical ensemble 
consists in the collection € of stationary distributions u parameterized by 
two parameters U= total energy and V= system volume so that, see (1.5.2): 


w(A)=1/N(U,V)  ifU—-DE< E(A)<U 


1.6.1 
(A) =0 otherwise ( ) 
with: 
N(U, V) = 5 1 = {number of cells A with 
U-DE<E(A)<U (1.6.2) 


energy E(A) € (U — DE, U)} 


where the quantity DE has to be a quantity, possibly V-dependent, “macro- 
scopically negligible” compared to U, such that one may think that all cells 
with energy between U — DE and U have the “same energy” from a macro- 
scopic point of view. 

The importance of the microcanonical ensemble in the relation between 
classical thermodynamics and the atomic hypothesis is illustrated by the 
argument leading to (1.3.8) which proposes it as the natural candidate for 
an example of an orthodic ensemble. 


2 At least in the thermodynamic limit, see (1.5.6), in which the volume becomes infinite 
but the average density and energy per particle stay fixed. 
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However, as discussed in §1.5, the argument leading to (1.3.7), (1.3.8) 
cannot possibly be regarded as a “proof on physical grounds” of orthodicity 
of the microcanonical ensemble.® 

Following the general definition in 81.5 of orthodic statistical ensemble, 
i.e. of an ensemble generating a model of thermodynamics, we can define the 
“absolute temperature” and the “entropy” of every element p (“macroscopic 
state”) so that the temperature T is proportional to the average kinetic 
energy. Boltzmann showed that such functions T and S are given by the 
celebrated relations: 


2 

= ae A S{u) = kg logN (U, V) (1.6.3) 
where kg, ” Boltzmann’s constant“, is a universal constant to be empirically 
determined by comparison between theory and experiment.* The factor Z is 
conventional and its choice simplifies some of the following formulae, besides 
the second of (1.6.3).5 

The statement that (1.6.1), (1.6.2) provide us with a microscopic model of 
thermodynamics in the thermodynamic limit V — œ, U — œ, N — œ so 
that u = U/N, v = V/N remain constant has to be interpreted as follows. 
One evaluates, starting from (1.6.1)+(1.6.3) (see also (1.5.5)): 


u = U/N = “specific energy” , v = V/N = “specific volume”, 
T = 2T (u)/3kgN = “temperature” , 
s = S(u)/N = “entropy”, , p = P(u) = “pressure”. (1.6.4) 


Since the quantities u, v determine u € € it will be possible to express T, p, s 
in terms of u, v via functions T (u,v), P(u, v), s(u, v) that we shall suppose 
to admit a limit value in the thermodynamic limit (i.e. V — oo with fixed 
u, v). 

Then to say that (1.6.1), (1.6.2) give a model of thermodynamics means 
(see also §1.5) that such functions satisfy the same relations that link the 
quantities with the same name in classical thermodynamics, namely: 


du = Tds — P dv. (1.6.5) 


Equation (1.6.5) is read as follows: if the state u defined by (1.6.1),(1.6.2) 
is subject to a small variation by changing the parameters U, V that define 
it, then the corresponding variations of u, s, v verify (1.6.5), i.e. the second 
principle of thermodynamics: see Chap.II for a discussion and a proof of 
(1.6.3), (1.6.5). 

The proof of a statement like (1.6.5) for the ensemble € was called, by 
Boltzmann, a proof of the heat theorem. 


3 Which it is worth stressing once more does not depend on the microscopic dynamics. 
4 As already said and as it will be discussed later, one finds kp = 1.38716 erg°k—!. 
5 Mainly it simplifies the relation between T and B in the first of (1.6.8) below. 
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(2) The canonical ensemble. 


The name was introduced by Gibbs, while Boltzmann referred to it with the 
name of holode. It consists in the collection € of stationary distributions 
u parameterized by two parameters 6 and v = V/N, via the definition: 


(A) = (exp —BE(A))/Z(8,V) (1.6.6) 


with 
Z(B,V) = Ÿ _exp-BE(A). (1.6.7) 
A 


Boltzmann proved the proportionality between T'(u) and 87} as well as the 
orthodicity of this statistical ensemble by showing that temperature and 
entropy can be defined by 


T =2T(u)/3kgN =1/kpB S=—kp(BU-—logZ(B,V)) (1.6.8) 


where kg is a universal constant to be empirically determined. 

The statement that (1.6.6), (1.6.8) provide us with a model of thermody- 
namics, in the thermodynamic limit V — œ, V/N — v, B = constant, 
has the same meaning discussed in the previous case of the microcanonical 
ensemble. See Chap.II for the analysis of the orthodicity of the canonical 
ensemble, i.e. for a proof of the heat theorem for the canonical ensemble. 


The relations (1.6.5) hold, as already pointed out, for both ensembles con- 
sidered, hence each of them gives a microscopic “mechanical” model of clas- 
sical thermodynamics. 

Since entropy, pressure, temperature, etc, are in both cases explicitly ex- 
pressible in terms of two independent parameters (u,v or 3,v) it will be 
possible to compute the equation of state (i.e. the relation between p, v and 
T) in terms of the microscopic properties of the system, at least in principle: 
this is enormous progress with respect to classical thermodynamics where 
the equation of state always has a phenomenological character, i.e. it is a 
relation that can only be deduced by means of experiments. 

It is clear, however, that the models of thermodynamics described above 
must respond, to be acceptable as physical theories, to the basic prerequisite 
of defining not only a possible thermodynamics® but also of defining the 
thermodynamics of the system, which is experimentally accessible. One can 
call the check of the two prerequisites a check of theoretical and experimental 
consistency, respectively. 

For this it is necessary, first, that the two models of thermodynamics co- 
incide (i.e. lead to the same relations between the basic thermodynamic 
quantities u, v, T, P, s) but it is also necessary that the two models agree 
with the experimental observations. 


6 Je a thermodynamics that does not come into conflict with the basic principles, ex- 
pressed by (1.6.5). 
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But a priori there are no reasons that imply that the above two prerequi- 
sites hold. 

Here it is worthwhile to get more deeply into the questions we raised in 
connection with (1.3.7) and to attempt a justification of the validity of the 
microcanonical ensemble as a model for thermodynamics with physically 
acceptable consequences and predictions. This leads us once more to discuss 
the ergodic hypothesis that is sometimes invoked at this point to guarantee 
a priori or to explain a posteriori the success of theoretical and experimental 
consistency checks, whose necessity has been just pointed out. 

In §1.3 we have seen how the microcanonical distribution could be justi- 
fied as describing macroscopic equilibrium states on the basis of the ergodic 
hypothesis and of a continuity property of the averages of the relevant ob- 
servables (see the lines preceding (1.3.7)): in that analysis, leading to (1.3.7), 
we have not taken into account the time scales involved. Their utmost im- 
portance has been stated in §1.4: if (1.3.7) held but the average value over 
time of the observable f, given by the right-hand side of (1.3.7), was at- 
tained in a hyperastronomic time, comparable to the one given by (1.4.3), 
then (1.3.7) would, obviously, have little practical interest and value. 


81.7. Critique of the Ergodic Hypothesis 


Summarizing: to deduce (1.3.7), hence for an a priori justification of the 
connection between the microcanonical ensemble and the set of states of 
macroscopic thermodynamic equilibrium, one meets three main difficulties. 


e The first is a verification of the ergodic hypothesis, §1.3, as a mathemat- 
ical problem. 


e The second is that even accepting the ergodic hypothesis for the cyclicity 
of the dynamics on the surface with constant energy (i.e. with energy fixed 
within microscopic uncertainty dE) one has to solve the difficulty that, in 
spite of the ergodicity, the elements of the microcanonical ensemble are 
not ergodic because the (trivial) non ergodicity is due to the fact that in 
the microcanonical ensemble the energy varies by a small but macroscopic 
quantity DE > E. 


e The third is that, in any event, it would seem that enormous times are 
needed before the fluctuations of the time averages over finite times stabilize 
around the equilibrium limit value (times enormously longer than the age 
of the Universe). 


The three difficulties would be solved if one supposed that, simultaneously: 


(i) the phase space cells with fixed energy (microscopically fixed) are part 
of a single cycle of the dynamics S: this is the ergodic hypothesis, see 81.3. 


(ii) the values of the “relevant” macroscopic observables are essentially 
the same on cells corresponding to a given macroscopic value of the energy 
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E, with the possible exception of a small fraction of the number of cells, 
negligible for large systems i.e. in the thermodynamic limit. 


(iii) the common average value that the relevant observables assume on 
trajectories of cells of energy E changes only slightly as the total energy 
changes between the values U and U— DE, if U and DE are two macroscopic 
values with U > DE (but DE > ôE). This can be called a continuity 
assumption. 


The hypotheses (i) and (iii), see §1.3, show that the average values of 
the macroscopic observables can be computed by using, equivalently, any 
ergodic component of a given microcanonical distribution y. 

Hypothesis (ii) allows us to say that the time necessary in order that an 
average value of an observable be attained, if computed on the evolution 
of a particular microscopic state A, is by far shorter than the recurrence 
time (too long to be interesting or relevant). The region of phase space 
where macroscopic observables take the equilibrium value sometimes has 
been pictorially called the “Boltzmann’s sea” (see [Bo74], p. 206, and [U168], 
p. 3, fig. 2). 

Accepting (i), (ii) and (iii) implies (by the physical meaning that u,p,v 
acquire) that the microcanonical ensemble must provide a model for ther- 
modynamics in the sense that dU + pdV must admit an integrating factor 
(to be identified with the absolute temperature). The fact that this factor 
turns out to be proportional to the average kinetic energy is, from this view- 
point (and only in the case of classical statistical mechanics as one should 
always keep in mind), a consequence (as we shall show in Chap.II). 

One can remark that assumptions (ii) and (iii) are assumptions that do 
not involve explicitly the dynamical properties, at least on a qualitative 
level: one says that they are equilibrium properties of the system. And it 
is quite reasonable to think that they are satisfied for the vast majority of 
systems encountered in applications, because in many cases it is possible 
to really verify them, sometimes even with complete mathematical rigor, 
[Fi64], [Ru69]}. 

Hence the deeper assumption is in (i), and it is for this reason that some- 
times, quite improperly, it is claimed that the ergodic hypothesis is “the 
theoretical foundation for using the microcanonical ensemble as a model for 
the equilibrium states of a system’. 

The improper nature of the above locution lies in the fact that (i) can be 
greatly weakened without leading to a modification of the inferences on the 
microcanonical ensemble. 

For instance one could simply require that only the time average of few 
macroscopically interesting observables should have the same value on every 
cycle (or on the great majority of cycles) of the dynamics with a fixed energy. 

This can be done while accepting the possibility of many different cycles 
(on which non macroscopically interesting observables would take differ- 
ent average values). An essentially exhaustive list of the “few” interesting 
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observables for monoatomic gases is given by (1.5.5). 

Furthermore the above-mentioned locution is improper also because, even 
if one accepts it, it cannot release us from checking (ii), (iii) which, in 
particular, require a quantitative verification: evidently one cannot be sat- 
isfied with a simple qualitative verification since the orders of magnitude 
involved are very different. One could, in fact, raise doubts that the time 
“for reaching equilibrium” could really come down from the recurrence times 
(superastronomical) to the times experimentally recorded (usually of a few 
microseconds). 

For what concerns the canonical ensemble its use could be justified simply 
by proving that it leads to the same results that one obtains by using the 
microcanonical ensemble, at least in the thermodynamic limit and for the 
few interesting observables (see above for a list). 

But, as already mentioned, the ergodic hypothesis (with or without the 
extra two assumptions (ii), (iii) above) is technically too difficult to study 
and for this reason an attempt has been made to construct models of ther- 
modynamics while avoiding solving, even if partially, the ergodic problem. 

The proposal is simply to prove that all the orthodic ensembles (at least 
the reasonable ones)’ generate the same macroscopic thermodynamics (for 
instance the same equation of state). This property, by itself very notable 
and remarkable, should then be considered sufficient to postulate, by the 
“principle of sufficient reason”, that the equations of state of a system can 
be calculated from the microscopic properties (i.e. from the Hamiltonian of 
the system) by evaluating the average values of the basic observables (see 
(1.5.5)) via the distributions of the microcanonical or canonical ensembles, 
or more generally of any orthodic ensemble. 

The latter is the point of view usually attributed to Gibbs: virtually all 
the treatises on statistical mechanics are based on it. 

It is well understandable why such a point of view appeared unsatisfac- 
tory to Boltzmann who had the ambition of reducing thermodynamics to 
mechanics without introducing any new postulate: on the other hand, the 
pragmatic approach of Gibbs is also very understandable if one keeps in 


7 One should not think that it is difficult to devise ensembles which are orthodic and which 
may seem “not reasonable” (for a thermodynamic interpretation): in fact Boltzmann’s 
paper, [Bo84], on the ensembles starts with such an example involving the motion of 
one of Saturn’s rings regarded as a massive line (in a parallel paper the example was the 
Moon, whose orbit was replaced by an ellipse of mass such that each arc contained an 
amount of mass proportional to the time spent on it by the Moon). This may have been 
one of the reasons this fundamental paper has been overlooked for so many years. Such 
“unphysical” examples come from Helmoltz, [He95a], [He95b], and played an important 
role for Boltzmann (who was considering them in a less systematic way even much 
earlier, [Bo66]). In fact if one can define the mechanical analogue of thermodynamics 
for any system, small or large, then it is natural to think that in large systems the 
average quantities will also satisfy the second law. And the idea (of Boltzmann) that 
the macroscopic observables have the same value on most of the energyy surface makes 
the law easily observable in large systems, while this may not be the case in very small 
systems. In other words the one-degree-of-freedom examples are not at all unphysical; 
rather the contrary holds: see Appendix 1.A1 (to Chap.I) for Helmoltz’s theory and 
Chap.IX for a recent application of the same viewpoint. 
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mind the necessity of deducing all the applicative consequences stemming 
from the marvelous discovery of the possibility of unambiguously deriving 
values of thermodynamical quantities in terms of mechanical properties of 
the atomic model of matter. 

For the past few decades, about a century after the birth of the above 
theories, we seem to feel again the necessity of a unified derivation of ther- 
modynamics from mechanics without the artificial a priori postulate that 
thermodynamics is described by the orthodic statistical ensembles; a pos- 
tulate made possible, i.e. consistent, by the mentioned independence (dis- 
cussed in the following Chap.II) of the results as functions of the statistical 
ensemble used. 

The ergodic problem and the statistical dynamics are therefore again at 
the center of research, and are stimulating new interesting ideas and results. 

Boltzmann tried to justify the microcanonical and canonical ensembles 
also following a path rather different from the one of studying the ergodic 
problem and the hypotheses (i),(ii),(iii) above, [UF63]. And his attempt 
led him, [Bo72], to deduce the Boltzmann’s equation which revealed itself 
essential even for technical applications, although it presented and presents 
various conceptually unsatisfactory aspects, see §1.8 for a first analysis of 
this equation. 


81.8. Approach to Equilibrium and Boltzmann’s Equation. Er- 
godicity and Irreversibility 


As discussed in the previous sections macroscopic equilibrium states can be 
identified with elements of the orthodic statistical ensembles (microcanoni- 
cal, canonical, grand canonical, ...). It is not quantitatively clear, however, 
through which mechanism a mechanical system initially in non equilibrium 
can reach equilibrium. 

We have argued that the ergodic hypothesis, by itself, is not sufficient to 
explain why a system reaches equilibrium within times usually relatively 
short. 

Boltzmann developed a model, [Bo72], for describing approach to equilib- 
rium which was strongly criticized since its formulation, much as his other 
intuitions, and which is considered by some (perhaps incorrectly) his great- 
est contribution to Science. 

The validity of the model is limited to systems with so low a density that 
they can be considered rarefied gases and this shows how it can, in concrete 
cases, happen that assumptions (i), (ii), (iii) of §1.7 could be, for practical 
purposes, verified in such systems and how it could be possible that the 
interesting observables reach their average values over time scales accessible 
to our senses rather than on the absurdly long recurrence time scales. 

One imagines the system to consist of N identical particles (for simplicity), 
each of which is described by momentum p and by position q. They move 
as if free, except that from time to time they collide. z 

Assuming that such particles are rigid spheres with radius R (again only 


I. Classical Statistical Mechanics 29 


for simplicity) and that they have an average speed ©, the low-density as- 
sumption is that the density p = N/V is such that 


isi pR? «1 (1.8.1) 


which means that it is very unlikely that there are two particles at a distance 
of the order R, i.e. “colliding”. 

At the same time one requires that the number of collisions that each 
particle undergoes per unit time does not vanish. Evidently this number 
has order of magnitude: 


rg pR?T. (1.8.2) 


Hence the limit situation in which the gas is very rarefied but, nevertheless, 
the number of collisions that each particle undergoes per unit time is not 
negligible, is described by 


R — 0, p — œ so that 
(1.8.3) 


PRY pR? > 0, pR? T = w = fixed quantity . 
The quantity r = 1/w is the time of flight between two collisions while the 
mean free path is Tọ = 1/pR?. 

The limit situation that is obtained by letting R — 0 and p — œ as in 
(1.8.3) is called Grad’s limit. In the situation envisaged by Boltzmann one 
supposes that we are “close” to this limit, i.e. one supposes that we are 
close to and pR?5 = w > 0. 

It is of some interest to compute pR*, r and Tọ for a Hydrogen sample at 
atmospheric pressure and room temperature (p = 1atm,T = 293°K): one 
finds pR? = 5.8 x 1074, T = 2.5 x 107! sec, T= 1.9 x 10° m/sec. 

Let then f(p,q) dpdg be the number of particles that can be found in the 
cell Q = dpdq of the phase space describing the states of a single particle 
(not to be confused with the phase space which we have been using so far, 
which describes the states of N particles). 

Boltzmann remarks that f can change in time either by virtue of collisions 
or because particles move in space. If € is a prefixed time interval, the 
number of particles that at a certain instant are in the cell Q is: 


f(p,4,t) dpdgq = f(p,q— ep/m,t — €) dp dqt+ 


1.8.4 + 5 (number of particles in Q’ that collide per unit of time with (1.8.4) 
Q’,Q” particles in Q” producing particles in Q1, Q2 with Qı = Q) 


— D (number of particles in Q1 = Q that collide per unit of time 
ro” with particles in Q2 producing particles in Q’, Q” 
Q",Q P 


If we consider the collision that transforms two particles in Q’, Q” into 
two others in Q1, Q2 we must have, by the conservation of momentum and 
energy in the collision): 


1.8.5 DR SPDs. DED =P PP, (1.8.5) 


1.8.6 


1.8.7 


1.8.8 


1.8.9 
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and the number of collisions leading from p’, p” to D, Py can be expressed 
in terms of the notion of collision cross-section o = a(p’, p ": P, P) 

The latter is defined to be the fraction of particles of a stream with 
momentum in dp and spatial density n(p)dp streaming around one par- 
ticle with momentum Po that collides with it in time dt, experienc- 
ing a rue that He p,p, into p',p”. This number is writ- 
P| 


ten as n(p p)dp Æ = = o(p',p";p,p,) dt and ø has the dimension of a sur- 
face. Hence the “total “number of such collision per unit volume will be 


n(p, dp, n(p p)dp [LEZ A a(p', p"; p, P)] dt, i.e. the the number of particles 
with momentum in dp that experience a collision with one Py -particle in 
time dt is the number of particles with momentum in dp and contained 


in a volume of size pe 2al o(p',p SP, Po) dt. It is therefore natural to call 


collision volume (per unit time) the quantity in square brackets: because 
it gives, after multiplication by the density of particles with momentum in 
dp, the number of collisions per unit time and volume that particles with 
momentum p would undergo against a momentum P, particle if there was 
only one such particle. 

Introducing: 


T, q) dp’ dq = number of particles with momentum p'; 
within dp’ in the cube dq = 
“number of collision centers” 


fp", q) dp” = density of particles with momentum p”, 
DE within dp”, in q = ~ (1.8.6) 
= “density of particles that 
can undergo collision” 


o(p’, p"; P, Py) = differential cross-section per unit solid 
angle for the considered collision, 


Note that the collision volume associated with a single collision center is, 
since the relative velocity at collision is |p’ — p’’|/m = |p — p,|/m n, also: 


(lp — p/m) o(p',p",p,p,) (1.8.7) 


Hence the total number of collisions from Q’, Q” to Qi, Qe is, per unit 


time: 
1 "| 


lp’ -p 
f(p”, Q) dp” Ei = 
m 


', P”, p, p ) f (P,a) dp’ da, (1.8.8) 


o(p 


clearly symmetric in p', p”, although derived by treating p’ and p” asym- 
metrically. 
By a similar argument the number of “inverse” collisions is: 


|p -2 


f (p, DF (p,, 9) dp dp, dg———= o(p, p,; p,p”). (1.8.9) 


1.8.10 


1.8.11 


1.8.12 
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One then remarks that (1.8.5) imply: 


dp’ dp” = dpdp, (“Liouville’s theorem” ) 4.8.10) 
lp’ — p”| = |p- pl (“conservation of momentum” ); de 
moreover the cross-section, as is in general true in collisions governed by 
central forces, depends exclusively on the deflection angle between (p' — p”) 
and (p—p,) and on the relative speed |p’ — p”|/m; and it is proportional to 
the (normalized) solid angle dw into which (p — p,) points with respect to 
(p' = p”). 

Note in this respect that the collision final data, i.e. p, Po do not determine 
/ 


p',p" via (1.8.5) but they leave the direction dw of p’ — p” arbitrary. 

“We shall then set o(p',p",p,p,) = o(w, |p! — pl’ |) dw “= o(w)dw where 
the last relation is only valid when the interaction between the spheres is 
assumed a rigid sphere interaction; and from the scattering theory it follows 
that in this case o(w) is independent of w: o(w) = 4r R?. 

Hence (1.8.10) allow us to rewrite (1.8.8), (1.8.9) as: 


F(p, a) F(R”, 9) dp dp, dg du (|p° — p"|/m) o(w) (1.8.11) 


where, given p,p,, the vectors p', p” are computed from (1.8.5) and from 


the information that the solid angle between p- p, and p' — p" is w. 
Introducing (1.8.11) in (1.8.4) and dividing by £ one finds the Boltzmann 
equation: 


of p p 
geL + Fai Bq P q) = = a) dw dp, 


- (F DF 9) — Fe DFL) 


(1.8.12) 


In (1.8.12) one supposes that q varies over the whole space: but the most 
interesting cases concern systems (one should say “rarefied gases” because 
of the conditions under which (1.8.12) has been derived) confined in a given 
volume V. In such cases (1.8.12) must be complemented by suitable bound- 
ary conditions that depend on the microscopic nature of the collisions of 
the particles against the walls. 

Since the discussion of the boundary conditions is delicate we shall avoid 
it and in the case of confined systems we shall suppose, for simplicity, that 
periodic boundary conditions hold. This means that we imagine the volume 
V as a cube with opposite faces identified: i.e. a particle that collides with 
one of the cube faces reemerges, after the collision, from the opposite face 
and with the same velocity. For a deeper analysis of the problem of the 
boundary conditions (and in general of Boltzmann’s equation) see [Ce69]. 

It should be clear that (1.8.12) is an approximation because we neglected: 


(i) the possibility of multiple collisions, 


1.8.13 


1.8.14 
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(ii) the possibility that particles in the same volume element do not behave 
independently, as implicitly assumed in deriving (1.8.12), and instead, as 
time goes by, correlations between positions and velocities build up, and 
make certain collisions more probable than others, or multiple collisions 
relatively more probable with respect to binary ones. This approximation 
is sometimes called molecular chaos. 


Such effects should disappear in the Grad-Boltzmann limit (1.8.3) provided 
they are absent at the initial time. This “conjecture” is known as Grad’s 
conjecture on the validity of the “Stosszahlansatz’, a word that, for tradi- 
tional reasons, just denotes the lack of correlation between the motions of 
different particles at various instants of time. 

In Appendix 1.A2 we show how an analysis of the corresponding conjecture 
can be easily performed in a much simpler case, in which a gas of particles 
moves in a space occupied by randomly placed spherical scatterers: a model 
called Lorentz’s model. The particles collide with the scatterers but do not 
interact with each other, so that Boltzmann’s equation for this model turns 
out to be linear. 

Returning to Boltzmann’s equation and postponing the analysis of the 
fundamental assumptions (i) and (ii) discussed above, the irreversibility of 
the approach to equilibrium that it implies can be demonstrated on the 
basis of the following remarks. 

Multiplying both sides of (1.8.12) by 1,p, SP p° or by log f(p,q) and in- 
tegrating over p and q (under the assumption that f (p,q) > 0 rapidly as 
(p,q) > œ or, when q is restricted to a fixed container, that f satisfies suit- 
able boundary conditions on the q coordinate) one finds that the quantities 


N= | 1.0 dpag P= fere ,q) dp dq 
T= [erea dpdg, H= - [109 log f (p, q) dp dg 
(1.8.13) 
satisfy the relations: 
dN dP dT 
Se 2 1.8.14 
ea ae (ERA 


RE CT) 


- (log f(p', a) f(p”, q) — log f (p, a) f(p,,@)) dpdp, dg > 0 


as can be checked by a simple calculation in which an essential role is played 
by the symmetry of the right-hand side of (1.8.12) between p, D, and p’, p” 
and dpdp, = dp’ dp” (i.e. Liouville’s theorem, see (1.8.10), and momentum 
conservation, see (1.8.5))) is used together with the relations log z +log y = 
log xy and (x — y) (log x — log y) > 0. 

Therefore while the first three relations in (1.8.14) imply five conserva- 
tion laws (of the particle number, of momentum and of (kinetic) energy), 


1.8.15 


1.8.16 
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the fourth manifestly implies irreversibility and is called Boltzmann’s H- 
theorem. 

Furthermore (1.8.14) shows that the only possible equilibrium distributions 
f (p,q) can be those for which 


fe DF0".®D = Fe, a) fa) (1.8.15) 


where p, Py P’, p” satisfy (1.8.5). 
Equation (1.8.15) and the arbitrariness of p, Pos p',p" imply, via a simple 
argument that we leave out, 


—B(a)(p-p,(a))*/2m 


plge 
OD" ra m 


(1.8.16) 
where (q), p (4 ), p(g) are arbitrary functions and the factor in the denomi- 
nator of the Tight- hand side has been introduced for the purposes of a simple 
normalization, so e. plq) cs ta interpreted as the density at the point 
q since it is then plq =ĪF( f( p,q 

“This means that 2 EN dos f(p, q) necessarily have the 
form (1.8.16). Considering subsequently the simple case of a system in a 
cubic container with periodic boundary conditions it is easy to show that, 
if f satisfies (1.8.12), (1.8.16) and Of /Ot = 0 (i.e. it is stationary) then it 
must necessarily be that 3(q), p(q), p (q) are g-independent. 

In fact if f has the form (1.8.16) the right-hand side of (1.8.12) vanishes 
and, therefore, Of /Ot = 0 implies p- 0 f/0q = 0. Hence denoting by fp, k) 
the Fourier transform of f with respect to q this implies that p-k flv, k) =0, 


so that if fp, k) is continuous in p it must be that fp, k) =0 fork £0. 
This means that f is q- independent and that B(a), p(g), p,(g) are constants. 

We see that the H-theorem not only shows that the system evolves irre- 
versibly, but it also shows that the one-particle distribution f(p, q) evolves 
towards the free Maxwell-Boltzmann distribution which, one should not fail 
to note, is just a typical property of an element y of the canonical ensemble 
in a system in which the interaction energy between the particles is so small 
(when their hard cores do not overlap) that the total energy of the system 
can be identified with the kinetic energy. The parameters 2, p, Po of this dis- 
tribution are uniquely determined by the initial data via the conservation 
laws in (1.8.14). 

It is natural to think that the H-theorem is, for rarefied gases, the micro- 
scopic version of the second law of thermodynamics which states that in 
isolated systems entropy increases (while equilibrium is approached): en- 
tropy should be identified as proportional to H. 

It is therefore important to stress that the H-theorem is manifestly in 
contrast with the reversibility properties of Newtons’s equations and, conse- 
quently, it cannot be a mathematical consequence of the latter, as already 
remarked, at least not in the literal, i.e. naive, sense of the word. 
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Just for this reason it becomes essential to understand whether this con- 
trast between macroscopic irreversibility and microscopic reversibility can 
be overcome. 

The alleged incompatibility between the two conflicting properties was the 
cause of violent critiques to Boltzmann who created the ergodic hypoth- 
esis (thus laying the foundations of the modern ergodic theory) in one of 
his attempts to answer his critics on a theoretical basis, solidly resting on 
mechanics rather than on the admittedly obscure Stosszahlansatz. 

To investigate the question one can take two viewpoints, which appear 
somewhat to overlap in Boltzmann’s brilliant and misunderstood (by his 
contemporaries) attempt at defending his theory and his H-theorem. 

The first point of view is that the ergodic hypothesis holds, in the sense of 
(i) in §1.7 reinforced by (ii) and (iii), and therefore 44 > 0 could be only 
approximately true in the sense that it should hold “most of the time”: when 
the cell SFA that represents the microscopic state at the instant kr runs 
through a great part of the ergodic cycle of given energy (i.e. the part in 
which the interesting macroscopic observables, see (1.5.5), are also constant 
for practical purposes). The relation dH/dt > 0 would then become false 
when SA exits such region. 

The latter circumstance however can only happen, in really macroscopic 
systems as well as in systems with few decades of particles, with a temporal 
frequency longer, by far, than the longest astronomical scales, see 81.4, 
(1.4.3). 

Therefore the system would for all practical purposes evolve irreversibly 
(and the evolution irreversibility would be symmetrical in time!). Reversibil- 
ity could manifest itself over time scales beyond eternity, i.e. of many orders 
of magnitude greater than the age of the Universe, already for systems like 
a gas at normal conditions in a container of the size of a room, or of a very 
small box. Or, alternatively, for an extremely short time around the initial 
time: enough to “forget” the peculiarity of the preparation of the initial 
state. 

A system set up initially in an “atypical” condition, e.g. occupying only 
half of the container, would expand to occupy the whole container and then 
it would continue to evolve without “ever” returning to occupy the initial 
half. 

Of course if a daemon acting a few seconds after the initial time inverts 
all the velocities of all the particles of the system, then the system would 
retrace its previous evolution coming back to the initial state (but just for 
a very short time) and then it would evolve by again occupying the whole 
container proceeding towards equilibrium exactly as it would have done 
if its velocities had not been inverted (and, furthermore, according to an 
evolution law described approximately by Boltzmann’s equation). 

This inversion of the motion with production of an atypical situation after 
a short time (i.e. a non astronomically long time) from the initial instant 
requires the exact inversion of all velocities: if they were inverted with an 
error, even very small (provided not “astronomically small’), the system 
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would not go back and, instead, it would probably continue to evolve as 
if “nothing had happened” from a macroscopic point of view, [LV93]. The 
effort of the daemonic heavenly creature to intervene on earthly affairs, after 
leaving the realm of metaphysics, would therefore be in vain. 

The second point of view has a more mathematical character and attempts 
at making quantitative the argument just described by relating it to Boltz- 
mann’s equation. 

One imagines an initial datum in which particles, hard spheres with radius 
R, are independently distributed in phase space; we suppose that the density 
with which each of them is distributed is pfo(p,q) where fo is normalized 
to 1: f fo(p,q) dpdg = 18 

This system is evolved with Hamilton’s equations (i.e. with elastic collision 
rules) and at time t one supposes that it is described by a distribution 
pfilp,q), without however assuming that the particles are independently 
distributed; this means that the one-particle distribution pf;(p,q) provides 
only the information on the number of particles in dp dq but no longer their 
correlations (as was the case at time 0, by construction) which will be non 
trivial, just because the ”Stosszahlansatz” will not hold. 

We now imagine, keeping t fixed, that p — oo, R — 0 so that pR? — 0 but 
pR? = \ = fixed quantity: i.e. we consider the Grad-Boltzmann limit, see 
(1.8.3). If the above qualitative discussion is correct and if one remarks that 
in this limit the gas becomes a perfect gas (because the particles become 
point masses) in which equilibrium is attained by virtue of collisions be- 
tween pairs of particles without two particles ever colliding more than once 
(because R — 0 implies just this, as it is easy to estimate the probability 
of recollision, i.e. of the event in the Fig. 1.8.1. 


A Cc 
e 


Fig. 1.8.1: Trajectory of C collides twice with A (A, B imagined fixed to simplify 


Note that if one throws randomly and independently hard spheres in a box then some 
of them may overlap. It is convenient not to exclude such a possibility provided one 
disregards completely the interaction of the overlapping spheres as long as they overlap 
and starts considering it only after they separate because of the motion: this is clearly 
a trick that introduces some minor simplification of the discussion while not affecting 
the macroscopic properties of the (rarefied) gas. 


1.8.17 
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drawing). 


as proportional to pR? — 0, per unit time), then recalling that we denote 
pfi the evolving one particle density (to keep the integral of f; normalized 
to 1) one has to conclude that the evolution of the limit lim fi = Ff, should 


be rigorously described by the Boltzmann equation which for f, is 


“Ot m ðq 


afi P Of, _ (pR?) J= = p"| o Cap „dw: 
(1.8.17) 
- (Fe d) Filo" 9) | 


One should note that this is p, R-independent because pR? and o(w)/R? 
are independent of R (recall that we are considering the case of hard sphere 
systems). 

Hence the Boltzmann’s equation should describe correctly the evolution 
of a rarefied gas for arbitrary times t: in fact we expect that in the Grad 
limit the recurrence times grow infinitely large while the collisions make the 
system evolve on a time scale fixed by the flight time: ((pR?)B)~+. This can 
also be seen from (1.8.17) in which the time scale is fixed by pR? |p’ — p’"|/m 
which, in the average, is just ~ pR?T 

Equation (1.8.17) has been proved with complete mathematical rigor only 
recently and for times t < 1/pR?T for systems of hard spheres and for 
interesting classes of initial data fo: this is the content of Lanford’s theorem 
on the Grad conjecture, [La74]. 

This is an important confirmation, mathematically rigorous, of Boltz- 
mann’s point of view according to which reversibility, and the corresponding 
recurrence times, is not in contradiction with the experimental observation 
of irreversibility. Because the time scale over which reversibility manifests 
itself is not observable while that in which irreversibilty can be observed 
is related to the time of free flight (oR?0)~'. Furthermore we see that 
irreversibility is not incompatible with the ergodic hypothesis, and Boltz- 
mann’s equation provides us with a model of the development of irreversible 
motions in situations in which the recurrence times are “infinitely” longer 
(even on astronomical scales) than the average time needed for a molecule 
to travel the free path (i.e. the flight time). 

Thus Lanford’s theorem, although it presents moderate interest for the 
applications due to the shortness of the interval of validity, t < (pR?25) !, 
has an enormous conceptual importance (apparently not yet fully appreci- 
ated by many) because it shows in a mathematically precise and rigorous 
fashion that there is no incompatibility between irreversible evolutions like 
the one described by the Boltzmann equation and the completely reversible 
Hamilton equations that describe the details of the microscopic motions. In 
fact mathematical rigor is particularly welcome here in consideration of the 
enormous amount of speculation on the theme and of pretended proofs of 
inconsistency between macroscopic irreversibility and mechanics. It has to 
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be hoped that, with time, Lanford’s theorem will be appreciated as a basic 
advancement of statistical mechanics. 

There is already a vast literature that developed following the spirit of 
Lanford’s work (which was heralded by various works) and here I cannot 
discuss the matter further: for a proof developed with attention to the later 
developments and for the developments themselves the reader is referred to 
the recent treatise by Spohn, [Sp91], pp. 48-76. 

This concludes our general introduction to statistical mechanics. We have 
seen that classical statistical mechanics holds only under certain conditions 
(see §1.2, for instance) at least as formulated here. It remains to analyze its 
consequences to deduce some of its applications and a better understanding 
of its validity and limitations. 

Such an understanding is based, as already remarked, on the very con- 
sequences of the theory and it cannot be derived a priori as shown, for 
instance, by the fact that the basic condition in 81.2, namely 0, /0_ > 1 
is compatible with very reasonable values of the temperature for “everyday 
physics” only because the intensity € of the molecular interaction energy 
has order of magnitude of ~ 10714 erg and the radius of the molecules has 
size ~ 2. x 1078cm. If this experimentally determined data had been very 
different the condition 0/0— > 1 could be impossible to satisfy at tem- 
peratures of importance for the observations usually carried out by classical 
thermodynamics. See Chap.II,III for a discussion of the latter points. 


81.9. A Historical Note. The Etymology of the Word “Ergodic” 
and the Heat Theorems 


This section and Appendix 1.A1 are written in a way to be independent of 
the preceding sections: therefore there are here and there a few repetitions 
of subjects already analyzed in §1.1-8§1.8. Few references to the previous 
sections are meant for readers familiar with them, but they are not essential 
for reading this section and Appendix 1.A1. 

What follows is an expanded and revised version of various of my writings 
on Boltzmann’s work, [Ga81], [Ga89].9 


(1) The etymology of the word “ergodic” and the heat theorems. 


Trying to find the meaning of the word ergodic one is led to a paper by 
Boltzmann, [Bo84]: see the footnote of S. Brush in his edition, [Bo64], of 
the Lectures on Gas Theory, on p. 297 (§5.10): here Boltzmann’s paper is 
quoted as the first place where the word is introduced. Brush acutely warns 
the reader that Ehrenfests’ paper misrepresents the opinions and even the 
terminology of Boltzmann and Maxwell and dates (in agreement with Gibbs, 


® Readers might be interested in the referee report to one of my papers, [Ga95a], as it 
shows, in my opinion, how blind to evidence an historian of Science can be at times. 
The contents of the paper in question are reproduced here; the referee report and the 
corresponding unamended original version can be found in [Ga95b] (in English). 
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see p. vi of the introduction of [Gi81]), the first appearance of the concept to 

1871, [Bo71b]. For instance the etymology that one finds in the Ehrenfests’ 
paper is incorrect on this point: see [EE11], note #93, p.89, (where also the 
first appearance of the word is incorrectly dated and quoted). 

In fact the basic idea of ergodicity can perhaps be traced to even earlier 
works, namely to the first work of Boltzmann on the theory of heat, [Bo66]: 
on p. 30 one finds that “... this explanation is nothing else but the mathe- 
matical formulaton of the theorem according to which the paths that do not 
close themselves in any finite time can be regarded as closed in an infinite 
time” (in this paper one also finds a general derivation of the necessity of the 
identification between average kinetic energy and absolute temperature). 

The [Bo84] paper by Boltzmann is seldom quoted, I found only Brush’s 
reference in [Bo64], and a partial account in [Br76], p. 242 and p. 368, 
before my own etymological discussion appeared in print in [Ga81], [Ga89|, 
[Ga95a]. More recently the paper has been appropriately quoted by [VP92]; 
the paper was discussed also by [Ma88]. However no English translation of 
[Bo84] is available yet. Nevertheless I think that this is one of the most 
interesting papers of Boltzmann: it is a precursor of the work of Gibbs, 
[Gi81], on ensembles, containing it almost entirely (if one recalls that the 
equivalence of the canonical and microcanonical ensembles was already es- 
tablished (elsewhere) by Boltzmann himself, [Bo68], [Bo71]), and I will try 
to motivate this statement. 

The paper stems from the important, not too well known, work of 
Helmholtz, [He95a], [He95b], who considered what we call today a system 
whose phase space contains only periodic orbits, or cycles of distinct ener- 
gies: te. essentially a one-dimensional conservative system. He called such 
systems monocyclic systems and noted that they could be used to provide 
models of thermodynamics in a sense that Boltzmann undertakes to extend 
to a major generalization. 

After an introduction, whose relative obscurity has been probably respon- 
sible for the little attention this paper has received, Boltzmann gives the 
notion of stationary probability distribution on the phase space of N in- 
teracting particles enclosed in a vessel with volume V. He calls a family 
E of such probabilities a monode, generalizing an “analogous” concept on 
monocyclic systems. In fact Boltzmann first calls a monode just a single 
stationary distribution regarded as an ensemble. But sometimes later he 
implicitly, or explicitly, thinks of a monode as a collection of stationary 
distributions parameterized by some parameters: the distinction is always 
clear from the context. Therefore, for simplicity, I here take the liberty of 
calling monode a collection of stationary distributions, and the individual 
elements of the collection will be called “elements of the monode”. 

The etymology that follows, however, is more appropriate for the elements 
of the monodes, as they are thought to consist of many copies of the same 
system in different configurations. By reading Boltzmann’s analysis one 
can get the impression, see p. 132 of [Bo84], that the word “monode” had 
already been introduced by Maxwell, in [Ma79]; however the reference to 


LIL 
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Maxwell is probably meant to refer to the notion of stationarity rather than 
to the word monode which does not seem to appear in [Ma79]. 

In fact the orbits of a monocyclic system can be regarded as endowed with 
a probability distribution giving an arc length a probability proportional to 
the time spent on it by the motion: hence their family forms a family of 
stationary probability distributions. 

Etymologically, from the context of [Bo84], this appears to mean a family of 
stationary distributions with a “unique nature”, (each consisting of systems 
with a “unique nature”, differing only by the initial conditions), from uévoc 
and eidoc , with a probable reference to Plato and Leibnitz. The concept 
appears, in fact, in some of Plato’s dialogues, see the entry uovoaidfc (“one 
in kind”) in [LS94]. 

Then the following question is posed. Given an element u of a monode 
E we can compute the average values of various observables, e.g. average 
kinetic energy, average total energy, average momentum transfer per unit 
time and unit surface in the collisions with the vessel walls, average volume 
occupied and density, denoted, respectively: 

T : K 

= wn! ) p= V 

where ® denotes the potential interaction energy and K the total kinetic 

energy. We then imagine varying y in the monode €, by an infinitesimal 

amount (this means changing any of the parameters which determine the 

element). Question: is it true that the corresponding variations dU and dV 
are such that: 


U=(K+9),, p, V, (1.9.1) 


H? 


dU + pdv 
T 


In other words is it true that the above quantities, defined in purely me- 
chanical terms, satisfy the same relation that would hold between them if, 
for some thermodynamic system, they were the thermodynamic quantities 
bearing the same name, with the further identification of the average kinetic 
energy with the absolute temperature? (§1.5). 

That the temperature should be identified with the average kinetic energy 
per particle was quite well established (for free gases) since the paper by 
Clausius, [C165], and the paper on the equipartition of kinetic energy by 
Boltzmann, [Bo66], [Bo68] (in the interacting cases); see the discussion of 
it in Maxwell’s last scientific work, [Ma79]. The latter paper is also very 
interesting as Maxwell asks there whether there are other stationary distri- 
butions on the energy surface, and tries to answer the question by putting 
forward the ergodic hypothesis. If so the monode would provide a “me- 
chanical model of thermodynamics” extending, by far, the early examples 
of Helmholtz on monocyclic systems. 

Thus Boltzmann is led to the following definition, see §1.5, (1.5.6): 


is an exact differential dS ? (1.9.2) 


Definition: a monode E is called an orthode if the property described by 
(1.9.2) holds. 


1.9.3 


1.9.4 
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By reading [Bo84] the etymology of “orthode” is composed by 6p66¢ and 

eldog i.e. “right nature” or “correct nature”. 

The above deep definition has not been taken up by the subsequent liter- 

ature. This is surprising, even more so as Boltzmann, in the same paper, 

proceeds to discuss “examples” of mechanical models of thermodynamics, 

i.e. examples of orthodic monodes. The above orthodicity concept is still 

attributed to Gibbs, see [Br76], p. 242. 

The examples of orthodes discussed by Boltzmann in his paper are the 

holode and the ergode which are two ensembles whose elements are param- 

eterized with two parameters 3,N or U,N, respectively. Their elements 

are 

dp, ie -dp dq, 7 -dq, Q-BK +8) 
const 


La.n (dpdq) = (1.9.3) 


and 


dp,...dp_dq,...dq 
bu, n (dpdq) = == == 5(K (p) + ®(q) — U) (1.9.4) 


const 


Boltzmann proves that the above two ensembles are both orthodes! thus 
establishing that the canonical and the microcanonical ensembles (using 
our modern terminology) are equilibrium ensembles and provide mechanical 
models of thermodynamics, see Chap.Il for a discussion of similar proofs. 

Boltzmann’s simple proof makes use of the auxiliary (with respect to the 
above definition) notion of heat transfer. In the canonical case it yields 
exactly the desired result; in the microcanonical it is also very simple but 
somehow based on a different notion of heat transfer. An analysis of the 
matter easily shows, see §3.2 in Chap.Il, that a definition of heat transfer for 
the microcanonical ensemble consistent with that of the canonical ensemble 
gives the result (1.9.2), but only up to corrections expected to be of order 
O(N~*). As we have alluded to in §1.6 there is a problem only if one insists 
in defining in the same way the notion of heat transfer in the two cases: 
Boltzmann does not even mention this, possibly because he saw as obvious 
that the two notions would become equivalent in the thermodynamic limit. 

Again from the context of [Bo84] one sees that the word “holode” has the 
etymological origin of 6d0¢ and eièoc while “ergode” is a shorthand for 
“ergomonode” and it has the etymological root of Épyov and elsoc , meaning 
a “monode with given energy”, [Ga81], [Ga95a]. 

The word “ergode” appears for the first time on p. 132 of [Bo84] but this 
must be a curious misprint, as the concept is really introduced on p. 134. 
On p. 132 the author probably meant to say “holode” instead; this has been 
correctly remarked by [VP92]. The above etymology was probably proposed 
for the first time by myself in various lectures in Roma, and it was included 
in the first section of [Ga81]. It has also been proposed in [Ja84], [Ma88]. 
The word “holode” is probably a shorthand for “holomonode”, meaning a 
“global monode” (perhaps a monode involving states with arbitrary energy, 
i.e. spread over the whole phase space). 
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This is not what is usually believed to be the etymology of “ergode”: the 
usual belieft? comes from the Ehrenfests’ statement that the etymology is 
épyov and óðóç , with the meaning of “unique path on the surface of con- 
stant energy, see note #93 in [EE11]. The latter etymology has been taken 
up universally and has been attached to the subject of “ergodic theory”, 
which is a theory dealing with time evolution properties. 


(2) The ergodic hypothesis, continuous and discrete phase space 


The etymological error of the Ehrenfests could be just an amusing fact: 
but it had a rather deep negative influence in the development of the 20- 
th century physics. They present their etymology in connection with the 
discussion (amounting to a de facto rejection) of the ergodic hypothesis of 
Boltzmann. In fact Boltzmann had come to the ergodic hypothesis in his 
attempts to justify a priori that the ergode, as a model of thermodynamics, 
had to produce the thermodynamics of a system with the given Hamiltonian 
function (and not just a model). 

Boltzmann had argued that the trajectory of any initial datum evolves on 
the surface of constant energy, visiting all phase space points and spending 
equal fractions of time in regions of equal Liouville measure. See 81.3. 

The Ehrenfests criticize such a viewpoint on surprisingly abstract math- 
ematical grounds: basically they say that one can attach to each different 
trajectory a different label, say a real number, thus constructing a function 
on phase space which is constant on trajectories. Such a function would of 
course have to have the same value on points on the same trajectory (i.e. it 
would be a constant of motion). This is stated in the note #74, p. 86 
where the number of different paths is even “counted”, and referred to in 
the note #94, p. 89. Therefore, they conclude, it is impossible that there is 
a single path on the surface of constant energy, i.e. the ergodic hypothesis 
is inconsistent (except for monocyclic systems, for which it trivially holds). 

The abstract mathematical nature of this argument, see also below for a 
critique, was apparently remarked on only by a mathematician, see [VP92] 
p. 86, (i.e. by Borel, 1914); but it escaped many physicists. It is worrying 
to note how literally so many took the Ehrenfests’ version of the ergodic 
hypothesis and how easily they disposed of it, taking for granted that their 
formulation was the original one by Boltzmann and Maxwell, see [Br76], p. 
383. 

Having disposed of the ergodic hypothesis of Boltzmann, the Ehrenfests 
proceeded to formulate a new hypothesis, the rather obscure (and somewhat 
vague as no mention is made of the frequency of visits to regions in phase 
space) quasi-ergodic hypothesis see notes #98 and #99, p. 90, in [EE11]; 
it led physicists away from the subject and it inspired mathematicians to 


10 It is important, in this respect, to be aware that Boltzmann had studied the Greek 
language and, by his own account, quite well: see [Bo74], p. 133, to the point of having 
known at least small parts of Homer by heart. Hence there should be no doubt that he 
did distinguish the meanings of Eidoc and 6386¢ which are among the most common 
words. 
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find the appropriate definition giving birth to ergodic theory and to its first 
nontrivial results. 

The modern notion of ergodicity is not the quasi-ergodicity of the Ehren- 
fests. It is simply based on the remark that the Ehrenfests had defined a 
nontrivial constant of motion very abstractly, by using the axiom of choice. 
In fact from the definition, consisting in attaching a different number, or 
even 6N —2 different numbers, to each distinct trajectory, there is in princi- 
ple no way of constructing a table of the values of the function so defined in 
order to distinguish the different trajectories. In a system which is ergodic 
in the modern sense the Ehrenfests’ construction would lead to a nonmea- 
surable function; and to a physicist endowed with common sense such a 
function, which in principle cannot be tabulated, should appear as not ex- 
istent, or as not interesting. Thus motion on the energy surface is called 
ergodic if there are no measurable constants of motion: here measurable is a 
mathematical notion which essentially states the possibility of a tabulation 
of the function. 

It is surprising that a generation of physicists could be influenced (in be- 
lieving that the ergodic hypothesis of Boltzmann had to be abandoned as a 
too naive viewpoint) by an argument of such an exquisitely abstract nature, 
resting on the properties of a function that could not be tabulated (and not 
even defined if one did not accept the sinister axiom of choice). What is 
remarkable is the coincidence that the recognition and the development of 
the axiom of choice was due essentially to the same Zermelo who was one 
of the strongest opponents of Boltzmann’s ideas on irreversibility; see also 
[Sc86]. 

Therefore it is worth, perhaps, trying to understand what Boltzmann may 
have meant when he formulated the ergodic hypothesis. Here one cannot 
fully rely on published work, as the question was never really directly ad- 
dressed by Boltzmann in a critical fashion (he might have thought, rightly, 
that what he was saying was clear enough). The following analysis is an 
elaboration of [Ga81], [Ga95a] in some respects it gets quite close to [VP92]. 
It should be noted that [VP92] has a somewhat different point of view 
on several key issues, although we seem to share the main thesis that the 
[EE11] paper is responsible for most of the still persisting misunderstand- 
ings on Boltzmann’s work, including the exclusive attribution to Gibbs of 
Boltzmann’s ideas on ensembles, so clearly elaborated in [Bo84]. This is 
so even though, by reading the literature carefully, it is possible to realize 
that many were aware of the connection of Gibbs’ work with Boltzmann’s; 
see for instance [Br76], p. 242, first of all Gibbs, see p. vi of [Gi81] where 
he quotes the first section [Bo71c] of [Bo71b]. 

My point of view, adopted in the preceding sections, is that of those who 
believe that Boltzmann always conceived of the phase space and time as 
discrete spaces, divided into small cells, see [Bo72], p. 346. He always 
stressed that the continuum must be understood as a limit, see §1.1 (see 
also [Br76], p. 371, and [K162], [K172], [K173], [Du59]). The book by Dugas, 
[Du59], is particularly illuminating (also) on this respect (see for instance 
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Chap.I and the quotations of Boltzmann presented there, where he appears 
to identify the discrete viewpoint with the atomistic conceptions). In his 
writings Boltzmann very often makes this point: see for instance p. 42-44, 
note 4 on p. 51 (discrete time), p. 54, p. 168, p. 169, p. 243 (discrete 
time), p. 252/253, in [Bo74]. 

Although Boltzmann seems to have sometimes been quite apologetic about 
such a viewpoint (even calling it a “mathematical fiction”, [Ba90], p.18, from 
[Bo72]; see also [VP92], p. 75), he took advantage of it to the point that 
one can say that most of his arguments are based on a discrete conception 
of phase space, followed at the end by a passage to the continuum limit, see 
81.1. It should be understood however that the discretization that Boltz- 
mann had in mind is by no means to be identified with the later concept of 
coarse graining; see Chap.IX where a modern version of Boltzmann’s dis- 
cretization is considered and where a distinction has to be made between 
cells and volume elements, see also [VP92] and [Ga95al]. 

It is easier for us, by now used to numerical simulations, to grasp the 
meaning of a “cell”: in the numerical simulations a cell is simply an ele- 
ment of the discrete set of points in phase space, each represented within 
computer precision (which is finite). One should always discuss how much 
the apparently harmless discreteness of phase space affects results. This is, 
however, almost never attempted, see [Ga95a] for an attempt. A “volume 
element” in phase space has, instead, a size much larger than the machine 
resolution, so that it looks like a continuum (for some purposes). In the 
previous sections we have been careful to keep the discrete treatment of 
phase space always quite explicit, so that later we shall be easily able to see 
which are the consequences of a verbatim interpretation of the phase space 
discreteness. 

Hence one can say that an essential characteristic of Boltzmann’s thought is 
to have regarded a system of N atoms, or molecules, as described by a cell of 
dimension ôq and dp in each position and momentum coordinates. He always 
proceeded by regarding such quantities as very small, avoiding entering into 
the analysis of their size, but every time this had some importance he seems 
to have regarded them as positive quantities. 

A proof of this is when he refutes Zermelo’s paradoxes by counting the 
number of cells of the energy surface of 1 cm? of normal air, [Bo96], a feat 
that can only be achieved if one considers phase space as discrete. His 
calculation has been discussed in §1.4, (1.4.3). 

In particular this point of view must have been taken when he formulated 
the ergodic hypothesis: in fact conceiving the energy surface as discrete 
makes it possible to assume that the motion on it is “ergodic”, i.e. it visits 
all the phase space points identified with cells, compatible with the given 
energy (and possibly with other “trivial” constants of motion), thus behav- 
ing as in a monocyclic system (as all the motions are necessarily periodic). 
This is in fact the definition in §1.3. 

The passage to the continuum limit, which seems to have never been made 
by Boltzmann, of such an assumption is of course extremely delicate, and 
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it does not lead necessarily to the interpretation given by the Ehrenfests. It 
can easily lead to other interpretations, among which the modern notion of 
ergodicity, but it should not be attempted here, as Boltzmann himself did 
not attempt it. 

In general one can hardly conceive that studying the continuum problem 
could lead to really new information, cannot be obtained by taking a discrete 
viewpoint. Of course some problems might still be easier if studied in the 
continuum, and the few results on ergodicity of physical systems do in fact 
rely explicitly on continuum models, [Si70]. However I rather interpret such 
results as illustrations of the complex nature of the discrete model: for 
instance the ergodicity theory of a system like billiards is very enlightening 
as it allows us to get some ideas on the question of whether there exist 
other ergodic distributions on the energy surface (in the sense of ergodic 
theory the answer is affirmative), and what is their meaning. The theory 
of the continuum models has been essential in providing new insights in the 
description of nonequilibrium phenomena, [RT71], [Ru78], [CELS93]. 

Finally the fruitfulness of the discrete models can be even more appreciated 
if one notes that they have been the origin of the quantum theory of radi- 
ation: it has even been maintained that Boltzmann had already obtained 
the Bose-Einstein statistics, [Ba90]. 

The latter is a somewhat strong intepretation of the 1877 paper, [Bo77]. 
The most attentive readers of Boltzmann have, in fact, noted that in his dis- 
cretizations he uses, eventually, the continuum limit as a device to expedite 
the computations, manifestly not remarking that sticking to the discrete 
viewpoint would lead to important differences in some extreme cases. In 
fact he does not discuss the two main “errors”, see Chap.III, that one com- 
mits in regarding a continuum formulation as an approximation (based on 
replacing integrals with sums), they were exploited for the first time by 
Planck, much later. The latter errors amount, in modern language, see 
Chap.Ill, to the identification of the Maxwell-Boltzmann statistics and the 
Bose-Einstein statistics, and to neglecting the variation of physically rele- 
vant quantities over the cells: see the lucid analysis in [Ku87], p.60; for a 
technical discussion see Chap.III. 

The above “oversight” might simply be a proof that Boltzmann never took 
the discretization viewpoint to its extreme consequences, among which there 
is that the equilibrium ensembles are no longer orthodic in the sense of 
Boltzmann, see Chap.III, (although they still provide a model for thermo- 
dynamics provided the temperature is no longer identified with the average 
kinetic energy), a remark that very likely was not made by Boltzmann in 
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spite of his consideration and interest on the possibility of finding other 
integrating factors for the heat transfer dQ, see the footnote on p. 152 in 
[Bo84].11 

The necessity of an understanding of this “oversight” has been in particular 
clearly advocated by Kuhn referring to Boltzmann’s “little studied views 
about the relation between the continuum and the discrete”, [Ku87], for 
instance. 

There are many directions into which the analysis of the foundations of 
classical statistical mechanics can be developed. A somewhat different view- 
point for instance can be found in [Kr79]: this work of Krylov, and particu- 
larly part III has been very influential on Russian theoretical physics. In it, 
besides a very detailed critique of the foundations and of Boltzmann’s and 
Gibbs’ work, the foundations of the theory of the ergodicity of hard sphere 
systems is laid down: it was pursued later by Sinai. It also provided grounds 
for subsequent work on coarse graining (see Chap.IX) of Sinai and, in Sinai’s 
interpretation, [Si79], also inspiration for the later theory of chaotic systems, 
[Si72], quite close to Ruelle’s proposal, see Chap.IX and [Ru78c]. 


Appendix 1.A1. Monocyclic systems, Keplerian Motions and Er- 
godic Hypothesis 


Consider a one-dimensional system with potential y(x) such that |y’(x)| > 
0 for |x| > 0, y”(0) > 0 and v(x) 5 + co. All motions are periodic so 
that the system is monocyclic. We suppose that the potential y(x) depends 
on a parameter V. 

One defines a state a motion with given energy E and given V. And: 


U = total energy of the system = K + 

T = time average of the kinetic energy K 

V =the parameter on which ọ is supposed to depend 
p = — average of Ovy. 


A state is parameterized by U,V and if such parameters change by dU, dV, 
respectively, we define: 


dL = —pdV, dQ = dU + pdV . (1.A1.1) 
Then: 
Theorem (Helmoltz): The differential (dU + pdV)/T is exact. 


In fact let xi (U,V) be the extremes of the oscillations of the motion with 
given U,V and define S as: 


11 In checking my understanding of the original paper as partially discussed in [Ga81], I 
have profited from an English translation that Dr. J. Renn kindly provided me with 
later, (1984). He noticed this footnote in [Bo84] while performing his translation, (un- 
fortunately still unpublished). 
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æ+(U,V) æ+(U,V) 
S = 2log y K(x; U, Vida = 210g | VU — ọ(x)dz 

æ_(U,V) x- (U,V) 

(1.41.2) 
so that N ) 4 
dU — ðv y(x)dV) 
RES (1.A1.3) 
Le 


and, noting that NS = \/ Żdt we see that the time averages are given by 


integrating with respect to Nr and dividing by the integral of TE We find 
therefore dU + pdV 
dS = ma (1.A1.4) 


The above analysis admits an extension to keplerian motions: such sys- 
tems are not monocyclic in the sense of Helmoltz, but if one considers only 
motions with a fixed eccentricity they have the same properties. 

It is convenient to study motions in polar coordinates (p, V), so that if 
A= p20, E = imi? — a m being the mass and g the strength of the 
attraction due to gravity (g = kM if k is the gravitational constant and M 
is the central mass) then 


1 A? 
E= m +", e=- (1.A1.5) 
2 2? p p 
and 
2 MA? Mg def sl. LT 1 
p = —(E- sz + —) = 4 (--—)(— — -) 
m 2p p Pp P+ P- P 
1 -2E 1 1 2 = 
sete Gee a Pen 9 a) 
p+p- MÅ py p- A 2 ee 


J P+P— Sie V/1—e2 = 


Furthermore if a motion with parameters (E, À, g) is periodic (hence E < 0) 
and if (-) denotes a time average over a period then 


mg mg 1 1 
E = — — = — — — ) = —— 
T (y) = C2) ae 
(Ky= = E, Task), Te=(i-Vi-e)T 
a a 
(1.1.7) 
Hence if S is defined by 
P+ 2 A2 
S= 210g | La a (1.41.8) 
p- \m 2p p 
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its differential is A : 
dE — mA dA daca m 2 dg 


o a2 Viner AZ 
dS = 0-A AT (1.A1.9) 


This means that: 
(1) If (E, A, g) are regarded as parameters then Tecc is an integrating factor 


of: 


=m 


dQ = dE + padA? + pdg, = ——, 
Q PA Pog, PA aT 


Pa = Z (1.41.10) 


(2) Suppose that e is kept constant, so that the states as characterized by 


(E, g). Then using V1 — e? = ,/—2E/m Ag ! and —E =T, i.e. 
dA? d(—E) dg 


= —2—=0 1.41.11 
ae TE ; ( ) 
one can eliminate dA?/A? from dS and find (after some simple algebra): 
dE + (-2E)g qd 2 
dS = See 0 = dlog À (1.A1.12) 


so that T is the integrating factor of dQ = dE + pdV if V = g and p = 
=E — 7, (Boltzmann). Note that the equations pg = 2T and E = —T can 
be interpreted as, respectively, analogues of the “equation of state” and the 
“ideal specific heat” laws (with the “volume” being g, the “gas constant” 
being R = 2 and the “specific heat” Cy = 1). 


(3) If g is kept constant and (FE, A?) determine the states the integrating 
factor of dQ = dE+p,dA?, with py = ot is not the average kinetic 
energy T but the eccentric temperature Tece- 


To check (2) note that by (1.A1.9), (1.A1.11) 


2 2 
dE (1-24 nu +) 


iS = a2 Vi ee a 2a2V1—e2 g 
(1- = e?) T 
= A 
(dE + Mdg)(1 - Le ) dE + dg | (1.41.13) 
(1— 1— FAT T 7 
dE + =2dg =a g? 
= = = dlog Iz 


This concludes the discussion of Boltzman’s version of Helmoltz’s theory. 


In general one can call a system monocyclic when it has the property 
that there is a curve ¢ — x(€), parameterized by its curvilinear abscissa £, 
varying in an interval 0 < £ < L(E), closed and such that x(£) covers all 
the positions compatible with the given energy E. 
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Let x = x({) be the parametric equations so that the conservation of energy 
can be written: 


ae (2) Ê +o(x(£)) =E. (1.41.14) 


Then if we suppose that the potential energy y depends on a parameter V 
and if T is the average kinetic energy, p = —(Ovy) it follows: 


_ dE +pdV 


ds T ; 


p=—{(ôvs), T=(K). (1.A1.15) 


A typical case to which the above can be applied is the case in which the 
whole space of configurations is covered by the projection of a single periodic 
motion and the whole energy surface consists of just one periodic orbit, or 
at least only the phase space points that are on such an orbit are observable. 
Such systems provide natural models of thermodynamic behavior. 

Noting that a chaotic system like a gas in a container of volume V will 
satisfy “for practical purposes” the above property we see that we should 
be able to find a quantity p such that dE + pdV admits the inverse of the 
average kinetic energy as an integrating factor. 

On the other hand the distribution generated on the surface of constant 
energy by the time averages over the trajectory should be an invariant dis- 
tribution and therefore a natural candidate for it is the uniform distribution, 
Liouville distribution, on the surface of constant energy. 

It follows that if u is the Liouville distribution and T is the average kinetic 
energy with respect to u then there should exist a function p such that T1! 
is the integrating factor of dE + pdV. 

Boltzmann showed that this is the case and, in fact, p is the average mo- 
mentum transfer to the walls per unit time and unit surface, i.e. it is the 
physical pressure. 

Clearly this is not a proof that the equilibria are described by the micro- 
canonical ensemble. However it shows that for most systems, independently 
of the number of degrees of freedom, one can define a mechanical model of 
thermodynamics. The reason we observe approach to equilibrium over time 
scales far shorter than the recurrence times is due (as discussed in the pre- 
vious sections) to the property that on most of the energy surface the actual 
values of the observables whose averages yield the pressure and tempera- 
ture assume the same value. This implies that this value coincides with the 
average and therefore satisfies the heat theorem, as Boltzmann called the 
statement that (dE + pdV)/T is an exact differential if p is the pressure 
(defined as the average momentum transfer to the walls per unit time and 
unit surface) and T is proportional to the average kinetic energy. 


Appendix 1.A2. Grad-Boltzmann Limit and Lorentz’s Gas 


It is interesting to see how to derive Boltzmann’s equation in simple models 
in which it becomes a linear equation. The models are well known since 
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1.42.3 
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Lorentz introduced them in his attempt to establish more firmly Drude’s 
theory of electric conduction in metals. 

In the models there are two types of particles: the W-particles (wind- 
particles) and the T-particles (tree-particles). 

The W-particles move through space interacting only with the T-particles 
which, however, are supposed to be infinitely heavy compared to the W- 
particle and are supposed to be at rest and randomly distributed in space. 

Each model is completely described by the W — T interaction and by the 
T-particle distribution. From now on we shall focus our interest on the case 
in which the T-particles are distributed as the space distribution of a perfect 
gas (Poisson’s distribution) with density n. We shall also assume that the 
T-particles are, with respect to the W-particles, hard spheres of radius a, 
reflecting the W-particles on their surface. 

The assumed tree distribution is such that the probability for finding inside 
a given region A, with volume V (A), exactly N tree particles, and for finding 


them in the infinitesimal cubes dce,,...,dc, around ¢,,...,¢y, is: 
dc,,...,dc = nN 
fasces) =a € PO Eden. den (1.A2.1) 


where the parameter n has the interpretation of density of the tree particles. 
Note that, since the T-particles are hard spheres only with respect to the 
W-particles but not with respect to the each other, there are configurations 
&,-..,€n of trees in which the hard spheres overlap, (for some comments 
on this point see §1.6). 
If x = (p,q) is a W-particle phase space coordinate (p= velocity, g= posi- 
tion) the symbol 7 > 
Sel (1.A2.2) 


will denote the W-particle coordinate x’ = (p’,q’) into which x evolves in 
time t in the presence of N tree-particles located at Cy;---,€nv- The symbol 
w(p) will denote the direction of p and ĉ will denote the pair (w(p),q) if 
x = (p,q). 

Since the velocity |p| is conserved it is clear that $"< a depends only 
on the trees located within a distance (|plt + a) from q. The symbols: 


(serena), (SEEN),  w (SE 2) (1.42.3) 


will, respectively, denote the velocity, position and momentum direction of 
(1.A2.2); and we also set: 


svg 2 (w (S É2)) , (SB""*2),) (1.42.4) 
Similarly we can give a natural meaning to the evolution of m W-particles: 


Seena (Se danser): a 


which takes into account the fact that there are no W-W interactions. 
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It is easy to derive Boltzmann’s equation for W-particles in the case of the 
Lorentz’ gas described above, see (1.A2.10) below. One realizes that the 
assumptions to be made in order to derive the Boltzmann’s equation are 
essentially the same as conditions 1), 2), 3), of Sect. 8. They are: 


(i) a W-particle never hits twice the same particle; 
(ii) molecular chaos is assumed; 
(iii) the size of the T-particles is negligible. 


Here by a “chaotic” W-particle state we again mean a state such that the 
W-particle correlation functions are a product of one W-particle distribu- 
tion: which are independent of the T-particle distribution. More precisely 
a chaotic state is such that the probability distribution for finding a certain 
configuration C of T-particles and a set of W-particles in 21,...,%m has 
the form p(C) |], fo(x:), where p(C) denotes the distribution (1.A2.1) 
and this is interpreted as 0 if any wind particle is inside the hard cores of 
C12 

Clearly assumptions (i), (ii) and (iii) can be only approximately true. 

Let us formulate Grad’s limit conjecture for the Lorentz gas. Assume that 
the initial W-particles state is such that the probability density for finding 
W-particle in dx, ...da%m is m1 times: 


eneoti D f |] fle) 042o 
comp (T1,...,%m i=1 


where fo(x) is a given function of x and the “integral” is the “sum” over 


all the T-particle configurations compatible with æ1,...,æ» (i.e. over the 
C’s such that no W-particle is located inside the hard core of a T-particle). 
The compatibility between (11,...,%m) and C is expressed by the notation 

(@1,.--,Lm)compC’. 

Note that (1.42.6) is not a product state for the W-particles: this is so 
because here we have hard core interactions between them and the T par- 
ticles. 

Consider the state obtained by evolving the initial state (1.A2.6): 


filtiri im t) = 
C comp (#1,...,@ 


Explicitly this means the following. Let p be the probability of finding the W- 
particles in a infinitesimal cube dx; ...dtm around the configuration X = (£1,..., £m) 
in the box Ao, and a tree configuration in the infinitesimal cube dc, ...dc,, around 
C = (c,...,cm) in the box A, assuming it wider by an amount a than Ao, at 
least. Here x; = (P,>4,)- Then p is the product of the probability in (1.A2.1) times 


m 


PC) I (99,23) (1.42.7) 
i=0 


= d 
m] Jo(xi) dzi) e So PE = where € = (p, q) and Jo dg means integration over 
p and over the q € Ao which are outside the hard spheres centered on C = (Cirsa N): 


In other words the W particles also have a Poisson distribution, in the region outside 
the T particles, with a density function fo. 
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and then let the T-particle density n tend to infinity and the hard core W-T 
radius tend to zero in such a way that na? — 0 but na? — 1 4 0,00. We 
shall imagine that the solid angle integration is normalized so that f dw = 1. 
Grad’s conjecture can then be formulated as: 


Ift >0 and under “mild assumptions” on fo, the following limit exists: 


lim Fred t) = f(a1,...,2m} t) (1.A2.8) 
eho ZNS 


and: 
m 


Cinta Em; t) = [[ Fes t) (1.A2.9) 


i=l 


and f(x;t) satisfies the Boltzmann equation: 


oF in) +p: Atat) = À |p| [gen — f(x,t))o(w)dw (1.42.10) 


ot 
where x = (p,q), x' = (p',q) and p' is a vector with the same length as p but 
forming an angle w with it; a2a(w) = a? is the scattering cross-section of a 
hard sphere with radius a and A71 = 4rna?. 

A similar conjecture can be formulated in a two-dimensional model; here the 
solid angle w has to be replaced by the deflection angle 3 (see Fig. 1.A2.2) 
and o(w) by a(6) = 5 sin É and A7! = 2an. Of course the Boltzmann limit 
will be, in this case, na? — 0, 2na — 7! Æ 0,00. 


It is easy to construct a proof of the above conjecture in the two- 
dimensional case. The three-dimensional case could be treated along the 
same lines as will become apparent from the proofs. We shall assume, for 
simplicity, the spatial dimension to be two. The direction w(p) will be in 
this case the angle Ÿ between p and a fixed axis. ~ 

We first specify the “mild assumptions” on fọ. The function fo(x) will be 
thought as fo(|p|,w(p), q), if x = (p,q), and we can write: 


follplw(p).g) = f dd de’ follo a )5(q — Dw) =w) (1.A2.11) 


we shall abbreviate (4, g') to €, dq’dw’ to d€ and 6(q — q‘)d(w(p) — w’) to 
ô(x — £). Hence, by using definition (1.A2.4), Eq. (1.42.7) becomes, for 
m=1: 


fe) = f agile f ASLE- (1-4212) 
comp x 
It is therefore useful to consider the function: 


g(é;2;t) = ere? | p(C)S(S à — €) (1.42.13) 


C comp x 


1.A2.14 


1.A2.15 
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where the factor e7”®” has been introduced for normalization purposes (note 
that it tends to 1, in the Boltzmann limit). It is easily checked that: 


g(€;x;0) =d(% — 6), foeminas 1 


| (1.42.14) 
fla,t) era f} dé follpl, €) (€; 2; t) . 


We shall show that as na? — 0, 2na — \~! # 0,00 the function g(€; x; t) 
will tend to a limit g(£;x;t) which satisfies the two dimensional analogue of 
relation (1.A2.10) with initial condition g(é; x; 0) = ô(ĉ — £) and |p| fixed. 
The linearity of (1.A2.10), and of the third (1.42.14), will imply, under 
suitable assumptions on fo, that also f(a, #) satisfies (1.42.10). 

We will not insist in discussing in which sense g(€;x;t) converges to 
g(€;2;t). It will appear from the proofs below that, if x(t) = (q + 


pt,p), at least go(€;2;t) JE xt) — e BtG(& — &(t)) converges to 


ÿo(é; x; t) £ g(é x; t) — e \PtG(& — &(t)) pointwise for t 4 0, and in the 
sense of the distributions for all t > 0. However a close examination of the 
proof will provide evidence against any uniformity of the convergence in t, 
unless ¢ is restricted to a bounded interval (for further remarks on this point 
see below). 

Under the above convergence conditions, “mild assumptions” could, for 
instance, be continuity and boundedness of fo. The proof is based on a 
simple change of variables in (1.42.13). 


Let x = (p,q) and let R(x,t) be the sphere with center q and radius 
(lplt + a); then S°,x depends only on the T-particles in c contained in 


R(x,t). Hence the integral (1.42.13) can be explicitly written as: 


one (1.42.15) 
2 V(R(xt nM cree 
= ema f e7” (R(x, seri (S$; EM à — £) dcı Fai dem 
M=0 R(x,t)M 3 


where V(R(x,t)) = area of R(x,t) and where use has been made of the 
assumed Poisson distribution of the T-particles (1.A2.1). 

Note that, in general, not all the T-particles c,,...,¢,, in (1.A2.15) will 
be hit by the trajectory SELOM y, 0O<7T<t. Let Arın denote the set 
of configurations c,,...,¢y Of N T-particles such that a W-particle with 
initial coordinate x hits, in the time t, all the N particles in c4,... Cy at 
least once. We deduce from (1.A2.15), see Fig. 1.A2.1: 


1.A2.16 


1.A2.17 


1.A2.18 
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Fig. 1.A2.1: The set P(t;c,,...,cn) is the dashed region. The circles represent trees 
Cie. En: (N = 5) and the length of the trajectory in the dashed region is |p|t. 
OO 
= N des. de Ci LN à 
gE T t) En 5 n 1 NI x Xe ey (&) 6(S y Ng- €) 
N=0 Ax,t,N 
= d d 
Claes, CM-N M-N _ —nV(R(z.t 
Sila M-N) ” Er RARE) (1.42.16) 
M=N 


R = set of points in R(x,t)“—% such thatc!,...,cy,_n € P(t;c1,...,cn) 


where Xe, ,..e, (x) is 1 if x is compatible with the hard cores of ¢,,..., Cy 
and 0 otherwise: the region P(t;c,,...,¢y) is the tube like region (see Fig. 
1.A2.1) swept by an ideal T-particle when its center is moved along the path 


The sum within square brackets in (1.A2.16) can be performed (since the 
integrals are trivial) and yields: 


eV (Pics) (1.42.17) 
so that g(€,2;t) is: 
glé; x; t) erT? 5 i, nN ea PV (P (tiey sey) . Xe cy (x): 
N=0° Aœt,N (1.42.18) 
e dc, . . -den 


The reader should note the very simple probabilistic meaning of this equa- 
tion which makes it almost self-evident: the T-particles in A, ży can be hit 
more than once in the time t. Divide As sn as Al, yU A’, , y where Al, y 
is the set of T-configurations in A, ; n such that all their T-particles are hit 
just once by the trajectory SÊL Ng, 0< 7 <t. 


T 
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To this decomposition of AL +N there corresponds a decomposition g(£; x; t) 
=: = gı (£; 2; t) +g ' (£; £; t) with 


nna? dc,,...,dc 
glé zt) =e S i QU a 


1.42.19 N=0" Azt, N : (1.A2.19) 
5 (S; ss EN a = £) e TV (P (tici en) 


We now perform the change of variables, illustrated in Fig. 1.A2.2 from the 


2N variables c4, ...,Cy to the new 2N +1 variables l1,...,lN+1, B1,..., BN; 
we get 
d dé N+1 N+1 N 
SEE = ara ($i lee) HOME 3 
L 7 
1.A2.20 (1.A2.20) 


represented as: 


Fig. 1.A2.2 


Hence the N-th order contribution to (1.A2.19) is given by (if w(p) = V 
and x = (p,q) = (pl, 0,4), £ = (|p|, 0", q)): 


oo N+1 Qn N+1 
* er (2na)N 1 Il dl; f Lion C2, l — |p|t)- 


N+1 
1.A2.21 $ VOD lL- (g — DS Bi — (8 — w(p)))e eV (P (HS Ew )h] 42,21) 
i=1 i=1 


where L; are the vectors represented by arrows in Fig. 1.A2.2 (|L | = l;); the * 
in (1.A2.21) means that there is an extra condition on the integration region. 
It is the condition that none of the spheres of radius a around c,,...,¢y 
has intersection with the straight segments of the broken line representing 
the trajectory in Fig. 1.A2.2 (i.e. this is the condition that c,,...,c, really 


ee to A7, x). Of course in (1.42.21), 6 D Bi — (Y — w(p))) means 
hoo r — (8° — w(p)) — 2h). 
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In the limit na? — 0, 2na — À! Æ 0,00 the restrictions indicated by 
the * in (1.42.21) become unimportant and nV (P(t; c,,...,¢y)) simplifies 


enormously: 
N+1 
vie nV (P(t;c),..-,¢€n)) > 2na D9 l; = A`} |p|t (1.A2.22) 
j=1 


Hence the limit g(£;x;t) as na? > 0 and 2na — A7! Æ 0, co of gı (£; x; t) is: 
= oo p2r , N N+1 
z : dB dl; 
À M i sin Bs dpi dl; ô li — |p|t): 
2 o Jo (I 2 4 ) 2 pl ) 
1.42.23 NAi N 
# D L- (g à) 6 (>: Bi — (8 — a) Le À tpl 
tS i=1 


In the derivation of (1.42.23) we have systematically disregarded conver- 
gence problems connected with the summation over N, M, etc., since they 
are trivial as a consequence of the presence of the factorials and of the 
boundedness of the integration regions. The limit (1.42.23) is pointwise for 
t £ 0 and it could be checked that it holds also in the sense of distributions 
for t > 0. 

Furthermore it could be checked that for t > 0 the function g(£, x,t) > 
gi(&, x,t) is bounded above by a L,(d&) function once the delta function 
contribution coming from the collisionless paths is subtracted to both terms; 
hence the limit (1.42.23) holds also in the L,(d&) sense. Finally, by direct 
computation, it follows from (1.A2.23) that: 


(1.42.23) 


1.42.24 fama =1 (1.A2.24) 


and this fact, together with the above convergence properties and (1.A2.14), 
implies the validity of the limit relation: lim 42.0 g(€;x;t) —e7*2!6(@ — 
si 


2na— \— 
&(t)) = F(E; x; t)e*I2!"5(@ — ê(t)) in Ly (dé) for t > 0; furthermore it could 
be proved that this limit holds, for t > 0, in the sense of the distributions. 
That (1.A2.23) is a solution of the Boltzmann equation can be checked 
directly by substituting g into (1.A2.10), with initial condition g(€;2;0) = 
ô(È—£) and |p| fixed, see for instance [Ga69] or check directly (recalling that 
a(w)dw = Z sin 6 Æ as, with our conventions, dw = ge if w is the “solid 
angle” in the direction (. 
To complete the proof of Grad’s limit conjecture it remains to deal with 
the m-particle distributions. However we skip this point since it involves 
straightforward calculations based on changes of variable of the type illus- 


trated in Fig. 1.A2.2. 


We have thus described a proof of the Boltzmann limit conjecture in the 
case of a two-dimensional Lorentz gas with hard core W-T interactions 
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and free gas distribution of the T-particles. The generalization to three 
dimensions would be trivial. See also [L582] where a more detailed and 
careful study of the mathematical aspects of the above analysis is performed 
with further insights and applications. 

A less trivial generalization would be obtained by keeping the hard core 
W-T interaction but assuming that the T-particles are spatially distributed 
as if they were a gas of hard spheres with hard core size being proportional 
to the W-T radius. Other generalizations are conceivable in the direction 
of allowing soft W — T particle interactions and more general T-particle 
distributions. 

Much more difficult and interesting would be the treatment of Knud- 
sen’s model, in which the T-particles are allowed to move without suffering 
changes in their momentum in the collisions with the W-particles. 

Had we done the calculations associated with the proof of (1.A2.9), we 
would have also found evidence of a lack of uniformity of the Boltzmann 
limit in the number m of W-particles even at fixed t: the larger m is, the 
closer one has to get near the Boltzmann limit in order to see factorization 
of the W-particle correlations. 

We also wish to remark that even when the Boltzmann limit conjecture 
is true, one cannot expect that the solution p f(r,v,t) to the Boltzmann’s 
equation (see §1.8 and (1.A2.8) above) is such that f(r,v,t) is a good 
approximation to the actual distribution f(r,v,t) for large t: in fact one 
intuitively expects that for times of the order of tm. f.p./ na? some nontrivial 
correlations will start building up thus destroying the molecular chaos and 
spoiling the validity of the Boltzmann equation. 

This last remark is quite deceiving since it tells us that we cannot use, 
without further assumptions, the Boltzmann equation to investigate the 
long time behavior and, in particular, to compute the transport coefficients. 
From a rigorous point of view we cannot even be sure that the lowest order 
in na of the transport coefficients is correctly given by the value obtained in 
the Boltzmann limit. However it seems reasonable that this is, indeed, the 
case at least if the dimension of the space is larger than two (in one dimen- 
sion a simple counterexample can be found by using soluble models [LP66]; 
in this case, however, the Boltzmann equation is a priori not expected to 
be a good approximation). 

For further reading on the Lorentz gas see [WL69], [L582]. 

The idea of the Boltzmann limit is clearly stated in [Gr58], see p. 214; the 
present proof in the case of the Lorentz gas is done in [Ga69] (for the case 
of g(€;x;t) only) and was inspired by discussions and suggestions from J.L. 
Lebowitz. 
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2.1.2 


IT. Statistical Ensembles 59 


§2.1. Statistical Ensembles as Models of Thermodynamics 


Given a mechanical system its microscopic states are described by the 
microscopic configurations of N (identical for simplicity) particles with mass 
m wandering in a given volume V: such configurations are represented by 
phase space cells of equal phase space volume h°%. 

The cells have dimensions 6p and ôq in momentum and position coordi- 
nates and they represent the maximal resolution with which we suppose 
that the microscopic states can possibly be observed: since we suppose the 
particles to be identical the phase space cells differing by a permutation 
of the particles must be regarded as identical. The parameter h = dpdq 
empirically represents the precision with which the microscopic states can 
be determined, see Chap.I, 81.1 and §1.2. 

Time evolution transforms cells into cells in a small time 7: so that cell A 
is transformed into A’ = SA by a transformation S defined in terms of the 
total energy or Hamiltonian function E(A), the sum of the kinetic energy 
K(p) and the total potential energy ®(q): 


N 
E(A) = E(p,q) = K(p) + ®(q) = >_p°/2m + >: ela; —4,) (2.1.1) 
E(p, q) > Umin = min E(p, q) > —00 


where p = (Po eo Py) q = (Gated) are the momentum and position 
coordinates of the N particles and y is interaction potential between par- 
ticles, see §1.2. The second of (2.1.1) is a stability constraint that we shall 
assume to hold for all N (with Umin dependent on N): without it many of 
the integrals that we shall consider would be divergent. 

In fact we shall see that the properly significant physical condition is that 
Umin can be taken > —BN for some B; see (2.2.17) below. 

We have then considered the stationary probability distributions u that 
associate with every cell, i.e. with every microscopic state, its probability 
(A) so that p(A) = (SA). 

Families € of stationary distributions can be identified with families of 
macroscopic equilibrium states in which a generic observable f, i.e. a generic 
function defined on the phase space cells, takes an average value in the state 
LEE: 


F= So MAA). (2.1.2) 
A 


Given a family € of stationary distributions on the space of microscopic 
states one can consider the averages that the most physically relevant ob- 
servables take in a state u € €: 


U(u) = Ÿ_H(A)E(A) “energy” 
A 


2.1.3 


2.1.4 


2.1.5 


2.1.6 
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V(u) = V(A) =V “volume” 

K (pn) =X H(A)K(A) “kinetic energy” (2.1.3) 
A 

P(p) = ` p(A) P(A) “pressure” 
A 


where P(A) is the momentum variation per unit time and per unit surface 
area undergone by the particles in the microscopic state A in the collisions 
with the container walls, i.e. P(u) is the force per unit surface area exerted 
over the walls, see §1.5. 

Therefore, given a family € of stationary distributions on the space of 
microscopic states, we shall call its elements a statistical ensemble or simply 
ensemble, after Gibbs, or monode, after Boltzmann, see 81.9. If € is such a 
family we can associate with every “macroscopic” state u € E the quantities 
U,V, K,p (energy, volume, average kinetic energy and average pressure) 
and we can ask whether the statistical ensemble € defines a “model of 
thermodynamics” in which the absolute temperature T can be identified 
with the average kinetic energy per particle up to a proportionality factor 
that, to simplify various expressions, is written as 2/3kg: 


_ 2 KW) 


ae (2.1.4) 


where kg is a constant to be determined empirically. 

The precise meaning of the locution “defines a model of thermodynamics” 
has been discussed in Chap.I, (see (1.6.5)); it means that by varying y in € 
and following the variations of U,V,T,p the relation: 


(dU + pdV)/T = exact differential (2.1.5) 


holds. Hence it will be possible, by integrating (2.1.5), to define a function 
S(u) on € so that the quantities U, V, S, T, p satisfy the relations of classical 
thermodynamics in which S has the interpretation of “entropy”: 


(dU + pdV)/T = as, (2.1.6) 


see (1.5.6). 

It is possible, in this way, to associate with each macroscopic state u € € 
the quantities U,T,S,p,V and define a “model of thermodynamics”: the 
statistical ensembles € that enjoy the latter property (2.1.6) were briefly 
called by Boltzmann “orthodes”, see 81.6 and 81.9, and therefore we shall 
refer to (2.1.6) by calling it the orthodicity property of the ensemble €. 

The existence of important classes of orthodic ensembles was demonstrated 
by Boltzmann who also provided some a priori reasons to expect that his 
examples should not only give mechanical “models of thermodynamics” but 
precisely the thermodynamics of the system, given to us by the experimental 


2.1.7 


2.1.8 


2.1.9 


2.1.10 
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observations: in this attempt he founded ergodic theory and the Boltzmann 
equation, see §1.9. 
Therefore the “theory of ensembles” poses three questions: 


(1) existence and description of orthodic ensembles 
(2) equivalence of the thermodynamics that they describe 


(3) comparison of the equations of state computed from the ensembles and 
the corresponding ones obtained experimentally. 


In this chapter we shall consider the two basic ensembles studied by Boltz- 
mann and we shall show their orthodicity, following the lines of Boltzmann, 
[Bo84]. 

The canonical ensemble (see §1.5) consists of the probability distributions 
u on the space of the microscopic states A which describe particles roaming 
in a volume V that, for simplicity, we shall suppose cubic and with perfectly 
reflecting walls. The probability of a cell is, by definition: 


wee (2.1.7) 
AEN i 
with E(A) = E(p,q), (p,q) € A, being the energy of the microscopical 
configuration A, (2.1.1), and: 
Ap V) = Sve FA (2.1.8) 
A 


is a normalization factor Z which will be called canonical partition function; 
the elements y of the canonical ensemble are therefore parameterized by the 
volume V and the quantity 6. 

The microcanonical ensemble consists of the probability distributions u 
parameterized by the parameters U and V defined by: 


_ f1/N(U,V) if U- DE < E(A)<U 
pays de otherwise (2.19) 


where N (U, V), called the microcanonical partition function, is: 


N(U,V) = 


5 TE { number of cells À of (2.1.10) 


E(A —DE 
See en energy E(A) € [U ,U] 


where DE is a macroscopic energy, albeit very small compared to U.1 


1 Or better compared with U + BN if the energy is bounded below by a stability bound 
Umin = —BN; our conventions give energy 0 to configurations in which the N particles 
are infinitely far apart and with zero speed: hence, if Umin is the minimum (potential) 
energy then the energy above the minimum energy configuration, the “ground state” is 
U —U° > U + BN and —BN is a lower bound for U°. 


2.1.11 


2.2.1 


62 IT. Statistical Ensembles 


In other words in the microcanonical ensemble one attributes equal proba- 
bility to all cells with macroscopic energy U and 0 probability to the others, 
while in the canonical ensemble one attributes relative probability (also 
called weight) e78} to all cells with microscopic energy U which, however, 
can take all possible values (i.e. all values between the minimum of the 
potential energy and +00). 

Proving “orthodicity” of the above ensembles means 


(a) expressing U, K, p in terms of two parameters (8, v), with v = V/N, 8 > 
0, in the case of the canonical ensemble, or (u,v) with u = U/N, v = V/N, 
in the case of the microcanonical ensemble, and 


(b) showing that, defining T equal to + times the average kinetic energy 
per particle, then: 


(du+pdv)/T = exact differential (2.1.11) 
as (G,v) or (u,v) vary, respectively. 


We shall see that while the canonical ensemble is already orthodic in finite 
volume, the microcanonical ensemble is orthodic “only” in the “thermody- 
namic limi? N — co, U — œ, V > œ so that U/N = u, V/N = v stay 
constant (or tend to a constant). 

This will be the physically interesting limiting situation, if one keeps in 
mind the size of N, in real physical systems. 


§2.2. Canonical and Microcanonical Ensembles: Orthodicity. 


There are many other examples of ensembles which are orthodic at least in 
the thermodynamic limit. However before proceeding to the discussion of 
other ensembles and of their equivalence (i.e. of their identity as models of 
thermodynamics) it is convenient to describe how one can check orthodicity 
of the canonical and microcanonical ensembles. This check is a key to 
the understanding of Boltzmann’s ideas and to the understanding of the 
mathematical mechanisms that make tractable a problem that at first sight 
might look formidable. 

Consider first the canonical ensemble case (2.1.7), (2.1.8). 

The partition sum Z7(3,V) can be computed, if the cell size h = ôp ðq is 


small, as: 
e- BK (P) eB &(@) dp dg 
ee (2.2.1) 


where the factor N! takes into account that the N particles are strictly 
identical and, therefore, indistinguishable as a matter of principle, so that 
by permuting the N particles one obtains microscopic states described by 
phase space cells that must be regarded as identical. 

We can identify a configuration A (i.e. a phase space cell, as the two no- 
tions coincide having adopted a discrete viewpoint (see §1. 1), of the system 


2.2.2 
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by giving the “occupation numbers” nı of particles in a small cube C1 of 
dimension (dpdq)? = h°, no in the cube C2, etc (the cubes should not be 
confused with the phase space cells: they are 6-dimensional boxes in the 
phase space of a single particle). Therefore in (2.2.1) we replace the sum 
corresponding to (2.1.8) with an integral: in this way a twofold error is 
committed: 


(i) an approximation error due to the fact that E(p,q) = E(A) only at the 
center of the cell A: we shall call this error (i.e. the act of confusing the 
average value in a cell with the actual value at the center) an analytic error. 


(ii) an error due to the fact that a microscopic configuration A C RSN 
described by giving the numbers n1,n2,...is counted in the integral (2.2.1) 
N!/n4!ng!... times instead of N! times. We call this a combinatorial error. 


Both errors are, obviously, infinitesimal as h — 0 (if one means that both 
dimensions dp and ôq tend to 0 as h — 0, as we always assume). They 
were neglected by Boltzmann in his analysis since he had no reason to think 
that the cell size would play any role in his classical world, besides that of 
allowing one to speak of the “number of configurations” of given energy. 

We shall neglect them here as well, postponing their analysis until it will 
become possible to discuss them a posteriori on the basis of theoretical 
consequences of the theory drawn by neglecting them: hence the theory, 
once developed, will allow us to evaluate under which physical conditions 
negligibility of the above errors becomes reasonable. 

Anticipating the results of the analysis (see also §1.2; details will be pro- 
vided in §2.6) these errors will become negligible at “high temperature” 
and, in the example of a perfect gas (®(q) = 0), at least for: 


T > Ty = (mkgh `? p?) ! (2.2.2) 


where p = N/V, kp = 1.38 x 107 Kerg°K-12 

The relation (2.2.2) is obtained (summarizing part of the discussion in §1.2) 
by remarking that the representation of the microscopic states by cells can 
be consistent only if 6p and ôq are smaller than the average values of the 
momentum and of the intermolecular distances.3 

Since by (2.1.4) the absolute temperature is such that 3kgT/2 is the av- 
erage value of the kinetic energy per particle, i.e. the average of p? /2m, 
it is clear that the average momentum will have order of magnitude 
P= VmkgT, while the average interparticle distance will be 7 = ÿ/V/N = 


2 The inequality (2.2.2) in the case of hydrogen at normal density and pressure: m = 
3.34 x 10729, N = 2.7 x 10!9 particles in V = 1cm, and choosing h = Planck’s 
constant = 6.62 x 107?" erg/°K, gives Ty = 1°K, very different values for Ty are 
obtained for other gases; see §1.2, where other necessary conditions for the validity of 
the approximations are also taken into account. 

This is a condition less stringent than the one examined in §1.2, (1.2.18): where T > To 
also imposes the compatibility of the description in terms of cells with the classical 
microscopic dynamics as a cell permutation. 


2.2.5 


64 IT. Statistical Ensembles 


p~'/3 and therefore the condition h = 6p 6q ~ pq = VmkpTp—'/° follows 

(necessary but not sufficient) yielding (2.2.2). See §2.6 below for a more 
detailed analysis. 

It is important, however, to keep in mind that when (2.2.2) is not valid, 
hence the cells sizes cannot be neglected, the very consistency of a cell 
representation of the microscopic states fails, and the whole theory should 
be reexamined from scratch. It will appear that in such circumstances 
quantum mechanics becomes important and classical statistical mechanics 
may lose validity in a fundamental sense. 

Making the assumption that (2.2.1) is correct without the necessary ana- 
lytic and combinatorial corrections and performing the orthodicity analysis 
is equivalent to setting h = 0, i.e. to admitting the possibility of infinite 
precision (simultaneous) measurements of position and momenta of (all) 
particles. 

We can evaluate, following Boltzmann, [Bo84], the thermodynamic quan- 
tities in the state described by the canonical distribution u with parameters 
BV. 

To simplify notation we shall identify the region V occupied by the system 
with the measure V of its volume (which we always think of as cubic). 

We shall use the fact that in our approximations the probability of finding 
the system in the microscopic state dp dq ise?" 22 dp dq/N!h2% Z(6,V), 
so that (2.1.3) become: Sa Jan 


N72 
P; : +? dp dq 
K = K(u) = Li |,-8K(@-88(0) 202 
a J È E) . nN NIZ(B,V) 

v=V/N 

-8 
OU Ulu) ag 08 Aa) (2.2.3) 

N 

p=Pw) => > 

g 2(8,V) 

| I e PKD) omy $ Mo" Uy Wi o Py 
v>0 S h3N N! 


where the sum is over the small cubes Q adjacent to the boundary of the box 
V by a side with area s while S = ~e s is the total area of the container 
surface and q, is the center of Q (note that S = 6V?/%), see §1.4, (1.4.4). 
It is not difficult to transform the last of (2.2.3) into a more useful form: 


p= 6S log 2(3,V), (2.2.4) 


the calculation is illustrated in detail in §2.6 below where we also collect 
other more technical deductions. 
At this point we only need a simple direct check. Let: 


F=-6"'logZ(8,V), S=(U-F)/TCOF=U-TS (2.25) 
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2 K(u) _ 1 aT dp 


= NE = — = 2.2.6 
3kg N kgb T B 
because the integral over q in (2.2.3) factorizes and the one over p is elemen- 
tary (i.e. gaussian): 


aF = (8? log 2(8,V) + 910) 8 — 8-1 log Z(8, V) aV = 
= (F—U) dT/T—paV =-SaT — pdV (2.2.7) 


hence: 
TdS = d(F+TS)+pdV = dU +pdV (2.2.8) 


which coincides with (2.1.6). 

We also read from the above relations the physical interpretation of the 
partition function Z(6, V): in fact the function F = — 67! log Z(6, V) is 
the free energy of thermodynamics. 

Equation (2.2.8) shows the orthodicity of the canonical ensemble. 

Note that (2.2.8) has been derived without any necessity to consider the 
thermodynamic limit N — œ, V — œ, V/N — v, as long as one accepts 
the approximations leading to (2.2.1) (i.e. if the cells size h can be taken as 
0 or, more physically, as negligible). This “unconditional” validity, for all 
N and V, should be regarded as a coincidence, as the following discussion 
shows. In the other ensemble cases consideration of the thermodynamic 
limit is necessary to establish the correct thermodynamic relations between 
U,T,S,p,p,V. In fact, to prove orthodicity of ensembles other than the 
canonical it is necessary to impose some physically important conditions 
on the interaction potential energy ®(q): the “stability and temperedness 
conditions”, see below. 

In particular the situation is somewhat more involved in the microcanonical 
ensemble case because in this case it becomes really necessary to consider 
the thermodynamic limit. 

The microcanonical partition function is defined in (2.1.9) and, up to the 
errors already pointed out in the case of the canonical ensembles, it can be 
written as: 


N(U,V) = a II (2.2.9) 


where Jpg is the phase space set in which (U — DE < E(p,q) < U). 

The thermodynamic quantities are defined by (2.1.3), and the pressure can 
be written just as in (2.2.3) with N (U, V), 1 replacing Z (8, V), e72 @*?@ 
respectively, and with the integral extended to the domain U — DE < 
K(p)+ F(q) < U. In this case U is a parameter defining, together with V, 
the elements of the ensemble. The temperature is defined as 2/3N kpg times 
the average kinetic energy per particle. See §1.6. 


2.2.10 


2.2.11 


2.2.12 


2.2.13 


2.2.14 


2.2.15 


2.2.16 
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Also in this case orthodicity is derived by a direct check. Let: 
S = kp logN(U,V) (2.2.10) 


and let T be (2/3kg) times the average kinetic energy per particle; one 
finds: 


1 ON 1 ON 


and we ask whether the right-hand side of (2.2.11) can be written as (dU + 
p dV)/T with p, V, T defined in (2.1.3), i.e. by relations analogous to (2.2.3). 


The derivatives of M can be studied as in the case of the canonical ensemble, 
and one finds that (2.2.11) can be rewritten, see §2.6, as 


where (if Jz is the domain U — DE < E(p,q) < U) we have set for a real: 


) (2.2.12) 


K(p)® dp da/R3N N! 
qe) = KD dg daft NI 
j Sig dp dg/RN N! 
(K(p)°)* Sisa eav K(p)° dp dq/h?% N! 
pora mien = Oo 
Jona cav W A/R NN! 


a real (2.2.13) 
a real (2.2.14) 


dV being an infinitesimal region (with volume also denoted dV) around V 
obtained by displacing by a distance 7, along the external normal to V, the 
surface elements of V. 

In other words (i (p)*) is the average value of the a-th power of K(p) 
with respect to the considered microcanonical distribution, while (K(p))* 
is the average value of K(p) with respect to a distribution u* obtained by 
imposing the condition that one among the N particles is constrained to be 
in the region dV around the surface of V. If the relations 


(K(p)*), (K(p)*)* = K(u)*(1. + Ow) (2.2.15) 


were valid, with Un => 0 and with K(u) equal to the average kinetic 
energy in the microcanonical ensemble, then one could deduce that (2.2.12) 
becomes, after dividing both sides by N and letting N — oo with X, g 
constants: 

ds = (du +pdv)/T. (2.2.16) 


In the microcanonical case one sees from (2.2.12), (2.2.16) that the parti- 
tion sum directly has the physical meaning of entropy: S = kg log N (U, V). 
Since M (U,V) is the number of microscopic states with energy U and al- 
lowed volume V (see also §1.4) this is the well-known Boltzmann’s relation 


2.2.17 


2.2.18 
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expressing entropy as proportional to the logarithm of the number of pos- 
sible microscopic states with given energy and volume, see [Bo77] and 81.9. 
To complete the analysis of the microcanonical ensemble orthodicity it 
remains to check (2.2.15): as already mentioned one needs for this purpose 
suitable assumptions on the potential energy ®(q). 

Such assumptions, which will have an important physical meaning, are: 


(a) Stability: this means that there is a constant B such that for every 
configuration (q,,---,d) =@ 


l) =X og, -4,) > -BN (2.2.17) 


This property not only says that the potential energy is bounded below (as 
usual in many mechanical systems) but it also says that its minimum cannot 
be too small as N grows. 


(b) Temperedness: there are three constants C > 0, x > 0, R > 0 for 
which: 


lela- d) < Cla- g? for Jq- d| >R (2.2.18) 


This is essentially a condition that says that “far” particles have “small” 
interaction: by this hypothesis the interaction energy between a particle 
and a uniformly filled half-space approaches 0 as the distance between the 
two tends to oo. In a large system the macroscopic subsystems have “small” 
interaction energy (i.e. much smaller than the product of the volumes oc- 
cupied by each). This can be considered as a property of the “short range” 
of the forces. 


Relations (2.2.17),(2.2.18) are not satisfied in the special but very impor- 
tant case of systems of charged particles interacting via the Coulomb force: 
qualitatively the problem really comes only from condition (b) because (a) 
is satisfied as one thinks that in realistic cases particles have hard cores 
(however, in spite of this, we shall see that even (a) poses a problem of a 
quantitative nature as the “obvious” hard cores are often of nuclear size 
which turns out to be too small for compatibility with the observations). 
statistical mechanics of systems interacting via Coulomb forces is therefore 
more delicate than that of systems interacting via phenomenological pair 
forces with short range (like Lennard-Jones potentials) which mean effective 
hard cores of atomic size (rather than nuclear size). 

Even more delicate is the statistical mechanics of gravitationally interacting 
particles. We shall see that while systems of charged hard core particles with 
the property of a neutral total charge do obey “normal thermodynamics” 
the same is not true for gravitationally interacting particles (so that we 
should not expect that a Star obeys the same thermodynamics as a pot of 
gas, just in case this idea occurred to you). 
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Equations (2.2.15) are related to the law of large numbers: they say that 
the variables + K (p), regarded as random variables with a distribution given 
by an element u of the microcanonical ensemble or of the corresponding p*, 
see (2.2.15), are variables with a “dispersion that approaches 0” in the 
limit N — oo, because the ratio K (p)/K (u) is such that ((K(p)/K{(u))*), 
((K(p)/K (u))%)* — 1 for all a; or the fluctuations of K(p)°, with respect 
to its average value (K(p)°) ~ K()%, do not have the order of magnitude 
of (K(p)*) itself, but are much smaller. 


The kinetic energy K(p) is however a sum of N “almost independent” 
2 2 = 


variables sini au i.e. not really such because they are constrained by 
U — DE — ®(q) < K(p) < U — ®(q)). Therefore it is clear that (2.2.15) 
requires a proof and it does not reduce trivially to the law of large numbers 
which is formulated for independent variables. We have in fact just discussed 
which extra assumptions are necessary in order to be able to show the 
microcanonical ensemble orthodicity. 

From a historical viewpoint the above treatment of the canonical ensem- 
ble is essentially the same as the original in Boltzmann, [Bo84]; the case of 
the microcanonical ensemble is somewhat different and more involved: the 
reason is that in Boltzmann the assumptions (2.2.15) are only implicitly 
made: in fact Boltzmann studies the problem from a slightly different view- 
point. He considers a priori a quantity that he identifies with the amount 
dQ of heat that the system receives when the microcanonical parameters 
change by dU,dV. In this way he shows that the microcanonical ensemble 
is orthodic even in a finite volume. This is possible because the definitions 
of dQ that he uses in the two ensembles are different and in the language 
used here they are consistent only in the thermodynamic limit (and only if 
(2.2.15) are assumed). But this is not the moment to attempt a philologi- 
cally correct treatment of Boltzmann’ s ideas (a treatment that is still quite 
unsatisfactory in the literature, see §1.9). 

To conclude this section we can ask how strongly the orthodicity of the 
canonical and microcanonical ensembles depends upon the hypothesis that 
(2.2.1) and (2.2.9) are good approximations to the partition sums (as finite 
sums over cells in phase space), and how strongly the orthodicity depends 
on the hypothesis that the system consists of only one species of identical 
particles. 

Without exhibiting any analytic calculations we simply say that, in the 
case that the integrals (2.2.9) or (2.2.1) are replaced by the sums that they 
are supposed to approximate, orthodicity must be formulated differently: 
in the canonical ensemble one has to interpret 5 as proportional to the 
inverse of the absolute temperature while in the case of the microcanonical 
ensemble one must define the entropy directly via Boltzmann’s formula: 
S = kg logN (U, V). 

One obtains in this way two models of thermodynamics, which are models 
in a sense which is natural although different from the one so far used. 
Namely in the first case by setting T = T the expression (dU + pdV)/T is 


2.2.19 


2.2.20 
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an exact differential, but T is no longer proportional to the average kinetic 
energy; in the second case setting T1 = (dU + pdV)/dS the quantity T 
is independent of the transformation that generates the variations dU, dV 
and the corresponding dS. Furthermore it is possible to prove that the two 
models of thermodynamics are equivalent, [Ru69]. 

The important and well-known universal identification, [C165], [Bo66], be- 
tween the average kinetic energy with the absolute temperature is no longer 
valid: in view of the role that this identification played in the birth of statis- 
tical mechanics and in its developments one should regard this as a shocking 
major change. See Chap.III for a more detailed analysis of this point. 

Therefore the ensembles in which the partition functions are evaluated 
without the “continuum approximation”, valid only when (2.2.2) (or better 
when (1.2.4), (1.2.5)) hold, can still be used for the formal construction of 
models of thermodynamics. 

However, as a consequence of the general considerations following (2.2.2), 
in such cases it is not clear what the physical meaning of the thermodynam- 
ics that is constructed from the mechanical model could be: a physically 
correct investigation would in fact require, in such situations, using quantum 
mechanics as a basis for the treatment. 

For what concerns the assumption of existence of only one species of par- 
ticles in the systems considered so far we simply mention that orthodicity 
does not depend on this assumption. But there are some obvious changes 
that one has to introduce in the formulation and in the combinatorial fac- 
tors to be used. As an example we just write the partition function for a 
general system with N; particles of species 1 and mass m1, N2 species 2 
particles with mass m2, etc. Under the assumption that the cell size can be 
neglected we have: 


1 dp, dg, dp, da, BIS. K(p,)-8%(,---) 
Owe ee ee M LE 
(2.2.19) 
and the probability of a microscopic state will be: 
dp, dq = K(p )-BE(q ,… 2 
TI (ne) « PY Ke) PME Vy E, (2.2.20) 


The natural generalization to this case of the notion of orthodicity is checked 
in exactly the same way as in the previous case of only one species of par- 
ticles. 


§2.3. Equivalence between Canonical and Microcanonical Ensem- 
bles. 


In the above study of canonical and microcanonical ensembles Boltzmann’s 
constant appeared several times: it was always denoted by the same sym- 
bols, but it was to be regarded as a priori different in each case. 


2.3.1 


2.3.2 


2.3.3 


2.3.4 
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In fact this is a universal constant kg = 1.38 x 10718 erg°k—!. 

The logical itinerary leading to the identification of kg and to showing the 
equivalence of thermodynamic models described by the orthodic canonical 
and microcanonical ensembles is discussed in this section. 

Suppose first that the molecules do not interact, y = 0, i.e. consider the 
microscopic model of a free gas. In this case it is easy to compute ex- 
plicitly the microcanonical and canonical partition functions, M, Z, in the 
approximation in which cell size is neglected (see (2.2.1) and (2.2.9)). 

One finds, performing the integrals (2.2.1) and (2.2.9) in polar coordinates 
in momentum space: 


VN (amt. — /2mU — DE)  ) 2(3N) 
MS — I aN an 


2 VN /2rm8 1" 


where Q(d) = T'(d/2)~+ 7 is the surface of the d dimensional unit sphere 
and T(x) is Euler’s gamma function (i.e. r(x) = (a — 1)!). 

The limits of (2.3.1) as N — œ, V — œ, with V/N = v, U/N = u 
fixed, are easily studied via Stirling’s formula T(x + 1) = x%e7~* V27a(1 + 
O(1/x)), or N! = NVe-NV2rN(1 + O(1/N)) and one finds, see §2.1 and 
(2.2.5),(2.2.10): 


S = kp logN(U,V) = 
log N 


V 3 U 
= Nkg(log + + Er + const + O( )) (2.3.2) 


F = -6t log Z(8, V) = 


log N 
= = NS (lo + — = log 8 + const + Oo). 


On the basis of the discussion in 82.2, S has the interpretation of entropy in 
the microcanonical ensemble and F of free energy, F = U — T'S, see (2.2.5). 
Hence we can compute the pressure in both cases: 


| „S, N 1 ; : 

77 (yY = kg V (1 + o(=)) microcanonical ise 
OF N 3. 

p= Ge = 87 = kgT v canonical 


If Na is Avogadro’s number (N4 = 6.0 x 102% molecules per mole) and 
N = nN4 (with n = number of moles), one sees that (2.3.3) establish that 
the perfect gas equation of state is pV = nRT in both cases, provided the 
value of kB is chosen the same in the two cases and provided it has the 
numerical value: 


kg = R/Na = gas constant/N = (8.30 107/N4) erg°K 7! = 


2.3.4 
= 1.38 x 107 erg°K !. 


2.3.5 


2.3.6 
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The specific heat at constant volume turns out to be, in the thermodynamic 
3n 


limit (after an easy calculation) Snr: for instance in the canonical ensemble 
the average total energy, equal to the average total kinetic energy because 
® = 0, is 3NkpT and is volume independent, at fixed N; see (2.2.6) and 
§2.6 below. 

As we see the two thermodynamics defined for the perfect gas by the two 
microscopic models, canonical and microcanonical, coincide in the thermo- 
dynamic limit and they coincide with the experimentally known thermody- 
namics of a free gas, provided the constant kg is chosen in both cases as in 
(2.3.4). 

We now ask whether the coincidence of the thermodynamics defined by the 
two statistical ensembles remains the same also for more general systems. 

This is the problem of equivalence of the microcanonical and canonical 
ensembles. It is a fundamental problem because it would be a serious setback 
for the whole theory if there were different orthodic ensembles predicting 
different thermodynamics for the same system, i.e. different relations among 
u,v, I, p, s, all compatible with the general laws of classical thermodynamics 
although different from each other. 

We shall see that “in general” for each given system there is equivalence (in 
the thermodynamic limit) between canonical and microcanonical ensembles 
if the constant kg appearing in the theory of the two ensembles is taken to 
be the same. 

Once equivalence of the thermodynamics, defined either by the canonical or 
by the microcanonical ensembles corresponding to a given system, has been 
established we shall ask the further question of whether the constant kg that 
appears as proportionality factor between temperature and average kinetic 
energy per degree of freedom is the same for all other systems, i.e. whether 
the numerical value (2.3.4) is system independent. 

The scheme of the proof of equivalence between canonical and microcanon- 
ical ensembles, already used by Boltzmann and Gibbs, is the following. Set 


No(U,V) | o 
0 , = PAT AS, 
E(p,a)<u RNN! 


(2.3.5) 


Note that WU, V) = No(U,V) — No(U — DE,V) and that the relation 
between Mo and Z is simply given by: 
+00 
Z(B,V)= 8 dEe PENO(E, V) (2.3.6) 


Umin 


if Umin is the minimum of the energy and if Z, W are given by (2.2.1), (2.2.9); 
this is satisfied by integrating (2.3.6) by parts over Æ; we treat here only 
the case in which the continuum approximation is accepted, (h © 0).4 


4 But one can check that the ensemble equivalence remains formally valid even if the cell 
sizes are not neglected provided the orthodicity notion is adapted to the new case as 
discussed in §2.2. 
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Hence, see §2.2: 


F(B,V) = 87 log Z(B,V) = 


2.3.7 = — 87t log 8 — grl log | e PEN (E, V)dE. 
Uin 


(2.3.7) 


The specific (i.e. per particle) thermodynamic quantities in the canonical 
distribution u with parameters 3, V are, in the thermodynamic limit (V > 
00, V/N = v fixed): 


fe (B, v)= lim —F (2, V) canonical free energy 
Noo N 
So U. o 
ucl, v)= Ain Cu) = DE (B, v) canonical internal energy 
= l = E A) ical absolute t t 
c= En = 3kp N canonical absolute temperature 
V 
2.3.8 v= = canonical specific volume (2.3.8) 
A o 
Pe= lim P(p) = — Ole eg, v) canonical pressure 
N—co Ov 
Uc = fe : 
Sc canonical entropy 


where in expressing uc, pe as derivatives of the free energy fe via (2.2.3), 
(2.2.4) the operations of differentiation and of limit have been interchanged 
without discussion, because we proceed heuristically with the aim of ex- 
hibiting the essence of the mechanism of equivalence. 

The same thermodynamic quantities can be evaluated also in the mi- 
crocanonical ensemble with parameters U,V; and of course they have an 
a priori different definition: 


Sin (Um, Vm) = —Tm$m + Um microcan. free energy 
U(u) _ U | 
n= > = m.c. internal energy 
2 KU) _ ð 
Larm ao = Ge (Um, T m.c. abs. temperature 
V 
Um = N m.c. specific volume 
Os 
Pm” P(n) = Tm zg, (um Um) m.c. pressure 
Um 
. kp 
2.3.9 Sm— Nim sy los (No(U, vV) — NU — DE, V)) E (2.3.9) 
ae OR 
= Nim, + log No(U, V) m.c. entropy 


where the expressions for Tm, Pm follow from (2.2.16), the expression for 
the free energy is the classical thermodynamic definition, while that of the 
microcanonical entropy requires a digression. 


2.3.10 


2.3.11 
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In the theory of the microcanonical ensemble the value of DE is not spec- 
ified (and it is only subject to the condition that DE  U and that DE is 
a macroscopic quantity, i.e. DE/N Wow’ De > 0. Nevertheless the the- 
ory of the microcanonical ensemble would proceed in the same way even 
if DE = U — Umin, te. as large as possible, and one would still obtain an 
orthodic ensemble, hence a model of thermodynamics in which the entropy 
would have the “new” value S = kg log Mo (U, V). 

The function 3,,(u,v) = limy—oo FB log No(U, V) is monotonic non de- 
creasing in u because such is, manifestly, (U,V) and in reality one can 
show that, in the cases we consider (i.e. stable and tempered potentials, see 
(2.2.17),(2.2.18)), it is strictly increasing (see Chap.IV) as we should wish 
because, if 3), = Sm, the derivative (03, /O0Um)~+ should be equal to the 
absolute temperature, which should be positive. 

Hence, at the dominant order in N — oo and ignoring problems of exchange 
of limits: 


No(U, V) = e?z Sn) 
NoU - DE,V) 
NoU,V) 


and a > 0 as a consequence of the strict monotonicity of 3,, in u, so that 
the two limits in the last of (2.3.9) coincide and Sm = Sm. This shows 
also the equivalence of the various versions of the microcanonical ensemble 
determined by various choices of DE = NDe with De > 0. 

Coming back to the equivalence between microcanonical and canonical en- 
sembles we fix the constant kg in (2.3.8),(2.3.9) to be the same quantity and 
we see that the problem can be formulated as follows: if we establish a corre- 
spondence between the canonical state with parameters 3 = 1/kpT., v = ve 
and the microcanonical state with parameters u = Um, U = Um such that 
T, = (kgb)! = Tm and ve = vm then all the other quantities with the 
same “name” (i.e. differing only by the label m or c) must coincide. In 
this way, because of orthodicity, all other thermodynamic quantities must 
coincide. Hence, if this coincidence really takes place, the two models of 
thermodynamics defined by the two ensembles will coincide. 

The reason why the coincidence takes place is quite simple, if one neglects 
matters of mathematical rigor and proceeds heuristically. For large N one 
finds, by (2.3.6) and the first of (2.3.10): 


love) 
e 


(Em(u—De,V)—3m(u,0)) _ -aN (2.3.10) 


A 
= eB e 


Z(G, Um) = B PENG (E, V) dE = 


Umin 
= NG ec BNUEN 8m (Ustm)/kB du = (2.3.11) 
Umin 


1 
~ const N? exp [N max(— 6u + Fp ons Um))| 
u B 


so that if the maximum is attained at a unique point ug, it must be that uo 


is such that 5 = a Dem (uo, Um), because the derivative with respect to u 


2.3.12 


2.3.13 


2.3.14 
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must vanish in the maximum point uo. Furthermore: 


Jo OO (E/N NOE, V) dE 
= N = 2 


JSE we Butsm (uum) /kB)N dy (2.3.12) 
TRS ee Patentum aN dy M 


because only the values of u = uo will give a leading contribution to the 
integrals as N — oo. Equation (2.3.12) also confirms the physical meaning 
of ug: it is the average energy per particle, i.e. the internal energy per 
particle. 

Recalling the relation remarked after (2.3.11) between ug and B and the 
fact that ue = uo, we have: 


1 Os 


1 
en (Uc, Um) = — Tm (uc, Um) t (2.3.13) 


5 


and choosing ve = Um and te so that Te = Tm(tm,Um) it follows that 
Uc = Um = uo, from the third of (2.3.8), (2.3.9). 

It remains to check that fm(Um, Um) = fe(G, vc); this follows from (2.3.11) 
which tells us that, for N — oo: 


fe(B, Um) = 8"! max(— Ou + Sm(U, Um)/KB) = 


= — 87" (—Btte RÉ Sm(Uc, Um)/kB) = 
= (uc — TesSm (uc, Vm)) = (2.3.14) 


= igs Tasmanie Folia 


because Te = Tm, Uc = Um. 

The identity between the free energy, internal energy and absolute tem- 
peratures implies (since the ensembles are orthodic, and therefore the usual 
thermodynamic relations hold) that of the entropies; so that the two en- 
sembles describe the same thermodynamics. 


82.4. Non Equivalence of the Canonical and Microcanonical En- 
sembles. Phase Transitions. Boltzmann’s Constant 


The derivation in §2.3 is classical but nonrigorous: it can be made rigorous 
via a more detailed analysis of the qualitative properties of the functions 
Sm(u, v) and fe(8, v): the central point of a rigorous proof of equivalence is 
in showing that s,,(u, v) is “well approximated” (for N large) by S(U,V)/N 
and, furthermore, it is a convex function of u and a convex function of 
v, while fe(8,v) is concave in both variables G,v. This implies that the 
maximum in (2.3.11) is actually reached at a point uo or, possibly, in an 
interval (u—, u+) where the function Gu — Sm(u, v) is constant in u. 
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A more detailed analysis of the question is postponed to Chap.IV: it is 
however useful to mention that such an analysis requires making use of 
the stability and temperedness properties of the inter-particles interaction 
potential @. 

As one can predict from the discussion that we have led, the proof (rigorous 
or not) of equivalence between the canonical and microcanonical ensembles 
no longer works, in general, if the maximum in (2.8.11) is reached on an 
interval (u_,u+), u- < uz rather than at a single point. 

By the general properties of concave functions, one can see that this pos- 
sibility can be realized only for exceptional values of 8 (and precisely for a 
set of values forming “at most” a denumerable set). This means that, for 
exceptional values of 8, i.e. of the temperature, corresponding elements of 
the canonical and microcanonical ensembles may be not equivalent. 

Such values of @ are exceptional, if they exist at all; therefore it must 
happen that as close as we wish to one of them, call it 3, there exist values 
B' and 8” which are not exceptional (8” < 3 < p’). 

For 6 = p’ or B = p" there is equivalence of the corresponding elements 
of the canonical and microcanonical ensembles; and in one case the internal 
energy will be u’ < u_ and in the other it will be u” > ui, having denoted 
by (u_, u4) the interval on which the function —g(u, v) = (— Bu + 5m(u, v)) 
takes its maximum in u for G = 6, as illustrated in Fig. 2.4.1:5 


— But Sm —B'u+ Sm 


Fig. 2.4.1: Graph of the —Gu + s(u,v) for different values of 8 


Hence we see that if for 8 = B the canonical and microcanonical states 
are not, or may not be, equivalent then it must be that the internal energy 
uc( B, v) shows a discontinuity jumping from u_ to u when £ is varied across 
B. Consequently also the specific entropy s.(3, v) must show a discontinuity 
because f.(3,v) = ue — Tese is necessarily continuous being convex, as 
mentioned above. 


5 Here the graphs will have a continuous first u-derivative if the inverse temperature 


(23) = Tai is continuous at constant v: this property is usually true but it is 
v 


nontrivial to prove it. We do not discuss this matter here, but in §4.3 we shall discuss 
the similar question of the continuity of the pressure as a function of the density at 
constant temperature. In the Fig. 2.1 we imagine that Tm is continuous (i.e. the 
plateau and the curved parts merge smoothly, “inside the black disks”, to first order). 
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What has just been said, rather than being an obstacle to the microscopic 
formulation of thermodynamics, shows the possibility that statistical me- 
chanics can be the natural frame in which to study the phase transition 
phenomenon. In fact we see that some of the thermodynamic quantities 
can have discontinuities in terms of others, exactly of the type empirically 
observed in phase transition phenomena, where entropy and energy of two 
coexisting phases are different while the free energy is the same. 

Hence cases in which there is no equivalence between corresponding ele- 
ments of the two ensembles, or more generally when there are corresponding 
but nonequivalent elements in two orthodic ensembles, can be taken as sig- 
naling a phase transition: this is in fact the definition of phase transition 
that is commonly accepted today. 

From the point of view of Physics what happens in a case of nonequivalence 
between two elements of two orthodic ensembles with corresponding ther- 
modynamic parameters can be clarified by the following considerations. In 
general the states of an ensemble describe thermodynamic equilibria but may 
fail to describe all of them, i.e. all the equilibrium phases (corresponding for 
instance to a given free energy and temperature, or to given temperature 
and pressure). 

In other words, a given ensemble may be not rich enough to contain among 
its elements u the statistical distributions that characterize all the pure 
phases or their mixtures: usually given a statistical ensemble € one will find 
among the u € € a distribution describing a particular mixture of coexisting 
phases (if there are more phases possible with the same free energy and 
temperature) but it may not contain the distributions describing the other 
possible phases or mixtrures. 

This is precisely what can be seen to happen in the cases of the canonical 
and microcanonical ensembles, at least in the few systems in which the 
theory can be developed until such details are thoroughly brought to light. 
See Chap.V. 

We can therefore conclude, in the case just examined of the canonical and 
microcanonical ensembles, that they provide equivalent descriptions of the 
system thermodynamics in the correspondence of the parameter values to 
which no phase transition is associated. In the other cases the possible 
nonequivalence cannot be considered a defect of the theory, but it can be 
ascribed to the fact that, when equivalence fails, the elements of the two 
statistical ensembles that should be equivalent are not because they describe 
two different phases that may coexist (or different mixtures of coexisting 
phases). 

One of the most interesting problems of statistical mechanics emerges in 
this way: it is the problem of finding and studying cases of nonequiva- 
lence between corresponding elements of the canonical and microcanonical 
ensembles (or more generally of two orthodic ensembles). 

We conclude this section by coming back to the question of the system 
independence of the Boltzmann constant kg. The above discussion only 
shows that the constant kg appearing in the theory of the canonical ensem- 
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ble must be the same as that appearing in the theory of the microcanonical 
ensemble, if one wants the two ensembles to describe the same thermody- 
namics (apart from the possible existence of phase transitions). 

It is, however, easy to give a general argument showing that kg must be 
system independent and, hence, it has the value given by (2.3.4) computed 
for the special case of a free gas. The idea is simply that we want our models 
of thermodynamics to also describe the same thermodynamics for a system 
that is part of a larger system. 

In fact putting into weak contact, mechanical and thermal, two systems that 
are in thermal equilibrium (i.e. that have the same temperature) one builds 
a composite system which, in the canonical ensemble, will be described by 
a distribution u with parameters 6, v for the first set and by the parameters 
(8, v') for the second. 

We suppose for simplicity that each of the two systems contains only one 
species of particle. The composite system will then be described by the 
product distribution u x y! because the two systems are independent and 
their mechanical interaction is supposed negligible (this is the meaning of 
the phrase “weak mechanical contact” ). 

On the other hand the distribution u x u’ must be equivalent to a suitable 
distribution x for the composite system; a distribution of equilibrium and 
canonical. In fact we accept that the thermodynamic states of a system 
can be represented by the elements of an orthodic ensemble. Hence if 
A and A’ are two cells representing microscopic states of the two systems 
A(A x A’) is proportional to exp — B(E(A) + E(A')), because the energy of 
the microscopic state (A x A’) is E(A) + E(A’), by the weak mechanical 
interaction hypothesis. Hence: 


exp —BE(A) — 8’ B(A’)) = exp- B(E(A) + E(A')) (2.4.1) 


for every pair of cells A and A’, hence 3 = 6! = B. 

But 8 = 1/kT, 6! = 1/k'T, B = 1/kT where T is the value, common 
by the assumption of thermal equilibrium, of the temperature in the three 
systems and k, k’, k are the three respective values of the constant kg. 

Hence k = k’ = k: i.e. k is a universal constant whose actual value kp can 
be deduced, as was done above in (2.3.4), from the theory of a single special 


system, namely that of the free gas which is the easiest to understand. 
§2.5. The Grand Canonical Ensemble and Other Orthodic Ensem- 
bles 

It is easy to see that there exist a large number of orthodic ensembles. 


eFor instance the following generalization of the microcanonical ensemble, 
with DE = U — Umin, i.e. equal rather than small compared to U — Umin 


6 Unless, perhaps, there are phase transitions, an exceptional case that here we shall 
suppose not to happen as we may imagine changing by a very small amount the ther- 
modynamic parameters of the systems, still keeping thermal equilibrium. 


2.5.1 


2.5.2 
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(as assumed in §2.2): 


WA) =1/NoUV)  E(A)<U nes 

(A) = 0 otherwise ae 
already considered in §2.3 (after (2.3.9)) is an orthodic ensemble, for the 
reasons discussed in §2.3. 

This ensemble is also called “microcanonical” (although perhaps improp- 
erly because this name was introduced for the case DE = NDe, De > 0, 
De € U/N). But this is a somewhat trivial example of a new orthodic 
ensemble. 


eA different and wide class of orthodic ensembles can be built by imagining 
to fix other particles in positions g°,q;...., and modify (q), see (2.1.1), 
into ®*(q): 
N 


&*(g)= Og) +>) D va, - G) (2.5.2) 


i=1 j 
where the sum over g; runs over points g; external to the volume V inside 
which the system particles are free to roam. The energy ®* has the meaning 
of potential energy of the system in the presence of particles fixed at points 
located outside the container. 
As the shape or size of the container changes, when we vary V, we imagine 
to remove the fixed particles whose positions fall into V. 


Starting with the potential energy (2.5.2) we form the statistical micro- 
canonical or canonical ensembles with energy function E(p,q) = T(p) + 
(q). 

If the fixed external particles are distributed reasonably, e.g. so that each 
unit cube only contains a bounded number of fixed particles, or a number 
slowly increasing with the distance of the cube from the center of V (i.e. if 
the fixed particles are roughly distributed with uniform density) then it 
can be shown (see Chap.IV) that the ensembles so obtained are orthodic, 
at least in the thermodynamic limit (V — œ, V/N = v, U/N = u fixed 
or V — œ, V/N = v, B fixed (respectively)), provided the interaction 
potential y satisfies the stability and temperedness of 82.2. If we do not 
wish to neglect the cells size then we should apply to such ensembles the 
comments at the end of 82.2 on the notion of orthodicity. 

The above new ensembles are called microcanonical or respectively canon- 
ical ensemble “with fixed particle boundary conditions”. It can be shown 
that they are equivalent, in the absence of phase transitions, to the usual 
canonical ensemble, in a sense analogous to that discussed in the previ- 
ous sections when comparing the canonical and microcanonical ensembles 
(i.e. they generate the same thermodynamics, in the thermodynamic limit). 
This can be done exactly along the same lines of argument that led to the 
equivalence between canonical and microcanonical ensembles in §2.3. 


2.5.3 


2.5.4 


2.5.5 
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Other orthodic ensembles can be obtained by letting N or V vary, i.e. by 
considering simultaneously microscopic states describing systems with dif- 
ferent particles numbers N or occupying different volumes V. 


eAn example which is very important in many applications is the grand 

canonical ensemble: its elements depend on two parameters 8 > 0 and À. 
They are probability distributions, on the cells A representing the states 
of an N particle system in a given volume V and with N = 0,1,2,...; if 
E(A) = E(p,q) = K(p) + ®(q) and if N(A) = number of particles in the 
microscopic state A then: ` 


e-BAN(A)-BE(A) 


BA) = —— 2.5.3 
= S08) eee 
where the denominator is called the grand canonical partition function 
BO = oe Ree (2.5.4) 
A 


and the thermodynamic limit consists simply in letting V — oo keeping À, 3 
fixed. 


eMore generally one can replace ®(q) with the potential energy ®* (q) de- 
scribed in (2.5.2); in this last case we talk about a grand canonical ensemble 
“with fixed particle boundary conditions”. 


eA further class of orthodic ensembles is provided by the pressure ensemble: 
it also admits variants with fixed particles boundary conditions. In this 
ensemble one fixes N but the container V is thought of as variable and 
susceptible of taking various volume values V; = V, V2 = 2V, Va = 3V,... 
etc, the shape always remaining cubic. 

If A is a cell describing a microscopic state with N particles enclosed in a 
container V(A) and having energy E(A) one defines for each value of the 
two parameters p > 0,8 > 0: 


e-BbV(A)-BE(A) 


DE Gr) 


where the denominator is called the partition function of the pressure en- 
semble, and 


(2.5.5) 


BD. XO eee (2.5.6) 


j=0 A: V(A)=V; 
The thermodynamic limit simply consists in letting N tend to infinity. 


Remark: One can also imagine taking the containers to be susceptible of 
assuming a continuum of values, e.g. any volume V keeping it, however, 
homothetic to a reference shape VT, e.g. to a unit cube); in this case the 


2.5.7 


2.5.8 


2.5.9 


2.5.10 
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sum over the volumes should be replaced by an integral (Vo)~+ ta dV. A 
simple application of the pressure ensemble will be found in §5.8. 


The theory of the grand canonical and pressure ensembles, as well as the 
theory of the various ensembles with fixed particle boundary conditions, 
can be developed by showing their equivalence with the canonical, or mi- 
crocanonical ensemble, and this can be done via the method of the “maxi- 
mum value” that we have described in §2.3: it will work if the interparticle 
potential y satisfies the stability and temperedness conditions, see (2.2.17) 
and (2.2.18), [Fi64],[Ru69]. 

As a further example of the maximum value method, of very common use in 
statistical mechanics, we deduce some of the properties of the grand canon- 
ical ensemble from the corresponding properties of the canonical ensemble 
and show their equivalence. Again we proceed heuristically, by ignoring 
problems of mathematical rigor. 

If u is a generic element of the grand canonical ensemble corresponding to 
the parameters À, 8 one has, see (2.3.7) and the first of (2.3.8): 


EA poe ey ST ere 
N=0 A,N(A)=N 
=X e Ve ene Z (2.5.7) 
N=0 N=0 
= X expV(—Bdv"* — Bu f.(8, v)) 
N=0 


where in the last sum v = V/N and Zn(6,V) is the canonical partition 
function for N particles in the volume V and with temperature T = 1/kB/. 
Hence for V — oo, and if vo is the value where the function —ZAv~! — 
Gu" f.(B,v), of the variable v, attains its maximum we find: 

Jim (1/V) logE(3, à) = — Bug! — Bug" fo(B, vo) (2.5.8) 
assuming that the maximum point vo is unique. Here vo satisfies (if one 
recalls that by (2.3.8) pe = —9£(6,v)) 

o = = 
g TOAN $ + Br in) =0->A+ fc(G, vo) + voPe(B, vo) = 0 
(2.5.9) 
On the other hand vo has the interpretation of grand canonical specific 
volume vg because: 


De De PANZN(B, V) = 
neo ON Zn (b, V) 7 

15 NV-le-(BAvT +60" fe(B;v))V CPO 

ou D pe BAU + BO? fe(B,0)V V—co 


== v! (2.5.10) 


V= 


1g? = NV = 
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2.5.14 


2.5.15 
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due to the maximum of —BAu ! — Bu !f.(B,v) being isolated and at the 
point vo. 

Therefore from (2.5.9), and classical thermodynamics, one finds the phys- 
ical meaning of A: 


-AN =F +pV =U-TS+pV =N (f.(B, vg) + VgPe(B,vg)) (2.5.11) 


i.e. —AN is the Gibbs potential corresponding to the parameters ((, vg). 
Furthermore from(2.5.8) one finds that 


Jim 2 log (A, 8) = Bpel, vo) (2.5.12) 


i.e. the grand canonical partition function is directly related to the canonical 
pressure associated with the parameters (8, vg). 

This suggests that the grand canonical and the canonical ensembles are 
equivalent if the elements with parameters (A, 5) and (8, vg) with vg = vo, 
see (2.5.9), are put in correspondence (i.e. are thought to describe the same 
macroscopic state). This can be checked by setting, see (2.1.3): 


jim 2H a(A N(A) 


A | 


Ug 


T, = lim (2/3ks) Laie )K(A)/N(A) 


= | 


v= Jim eee) )V/N(A (2.5.13) 


Dy = Jim Sul n(A 
A 
5g = (ug — lim 8~*(1/V) log 3(6, A))/T, 
and by showing the identity between the above quantities computed in the 


grand canonical ensemble with parameters (A, 3) and the quantities with the 
same name computed in the emon ensemble with parameters (8, vg). 


Using the fact that T.(B,v) = gq and 
À ne NAN V)Te (6, $ 
T,= lim De ) = —— (2.5.14) 
V= SSN O eB ZN (8, V) kgl 


we see that, for the same reason used in deriving v, = vo in (2.5.10): 


e7 6AN 7 (8, % 
ug = lim Zn 5, a = g x) = üe(B, vo) 
, 2.5.15 
L pa En EPZ OV PBB) pay) 
Pg = ye ANZN, V) = Pe(D; Vg), 
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(2.5.14), (2.5.15), (2.5.12) clearly show that all grand canonical thermody- 
namic quantities coincide with the corresponding canonical quantities. 

As in the previous case of canonical versus microcanonical ensembles the 
above analysis is not rigorous because it involves various interchanges of 
limits and, furthermore, it presupposes that (—Av~! — vT! f(8,v)) has a 
unique isolated maximum (as a function of vg) but under the assumptions 
of stability and temperedness of §2.2, (2.2.17), (2.2.18), the problems of 
mathematical rigor can again be solved, see Chap.IV. 

From the theory of the canonical ensemble, which we shall elaborate on 
in more detail in Chap.IV, it follows also that the function —v~!f(@,v) is 
convex in 3 and in v~! so that, with “few” exceptional values (i.e. at most 
a denumerable family of values of À),7 the function —(Av~! + v7! f.(6, v)) 
has a unique maximum point as a function of v-!. For À outside this 
exceptional set there is complete equivalence between thermodynamics of 
the equilibrium states in terms of the elements of the canonical and grand 
canonical ensembles. 

For the other values of À (if any) the function —(Av~+ + v7! f(@,v)) takes 
the maximum value in an interval (v—, v+), as implied by the general prop- 
erties of concave functions, see the Fig. 2.4.1, in such cases the descriptions 
of states in terms of canonical or grand canonical distributions may be 
nonequivalent. But the interpretation of the nonequivalence is again that 
of occurrence of a phase transition: non equivalence has to be interpreted 
by attributing it to the fact that the distributions in question describe two 
different equilibria that can coexist in thermodynamic equilibrium (i.e. they 
both have the same temperature and pressure, but different specific volume, 
entropy, etc) in the same sense as discussed in §2.4. 

One of the main results of statistical mechanics which we wish to quote 
with more detail has been that of showing that, at least in many interesting 
cases, there is complete equivalence between the ensembles which will be 
called here enlarged ensembles: such ensembles are obtained from a given 
ensemble of stationary distributions (like the canonical, microcanonical or 
grand canonical) by adding to it all the distributions with boundary condi- 
tions of (arbitrarily) fixed external particles. 

In such larger ensembles it may still happen that two given states, corre- 
sponding to the same values of temperature and pressure, may have different 


7 Of course a denumerable set of values has zero length but it might be quite large in 
other senses: for instance it could be dense! Therefore this easy way of saying that 
phase transitions are “rare” is very unsatisfactory. But a more detailed analysis is very 
difficult and perhaps impossible at the level of generality in which we are discussing 
the matter. More detailed statements, e.g. that the discontinuities in À take place at 
finitely many values of @ and in the plane À,8 they occur on smooth lines, can be 
derived only when considering very special cases; see Chap. VI. One can ask whether the 


Ofm 
Ov 


continuous v-derivative and we could draw a figure similar to Fig. 2.4.1 in §2.4. Also in 
this case, see the corresponding comment about the temperature in the microcanonical 
ensemble in footnote 5 in §2.4, it is nontrivial to show that p is a continuous function of 
the specific volume at constant temperature: in §4.3 we shall discuss this in more detail, 
in the case of hard core systems. 


pressure -( Jo = pis a continuous function of v: if so Av! +07! fm (8, v) will have 
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averages for other thermodynamic quantities (energy, entropy, specific vol- 
ume, etc) but it will happen that for every element of one ensemble there 
is another in the other ensemble describing exactly the same macroscopic 
state and thermodynamics; i.e. that associates the same values to all ther- 
modynamic quantities and even the same relative probability distribution 
to the most probable microscopic states. 

In other words we can say that even the phase transition phenomenon 
can be studied in an enlarged ensemble without worrying that in this way 
one may “miss” some phases, because the enlarged statistical ensembles are 
often rich enough to contain all possible phases and their mixtures. 

This is the way to understand the nonequivalence between ensembles which 
normally describe correctly the thermodynamics of the system (i.e. that are 
orthodic). One can, in fact, think that, given a state of a system, one can 
look at an ideal macroscopic region which however occupies a volume less 
than the total; then the system in this volume can be regarded as a system in 
equilibrium with its surroundings. Such a system will have a fixed volume 
but a variable particle number. One would thus describe it naturally in 
the grand canonical ensemble; however the particles in the system have 
interactions with the identical particles that are outside the ideal volume 
selected. 

If one imagines taking a picture of the configuration one will see a sample of 
the configuration in the inner volume and one in the outer volume. Taking 
the highly imaginative step of collecting only the pictures in which the 
external configuration is the same we should still see statistically the same 
state inside the ideal box: this means that the state we see in the ideal 
box is determined by the state of the particles outside it provided they are 
chosen in a configuration “typical” for the state that is being considered. 
If this is so we can expect that the grand canonical ensemble with fixed 
particle boundary conditions can describe all possible states. 

When there is more than one equilibrium state we can describe them by 
selecting at random a configuration of the system and forming the grand 
canonical distribution in a large volume with boundary conditions given by 
the selected configuration of particles. 

Note that from this viewpoint the phenomenon of phase transitions ap- 
pears as an instability of the thermodynamic properties of a system with 
respect to variations of boundary conditions: for instance keeping the same 
temperature and pressure but changing boundary conditions one can ob- 
tain different values for intensive thermodynamic quantities like the specific 
energy, the specific entropy, the specific volume, etc, i.e. by changing the 
forces that act near the boundary of our system we can change the macro- 
scopic state even if the system is very large (hence the boundary is far and 
relatively small compared to the volume). 

In a sense this is a further manifestation of the richness of statistical me- 
chanics: such a complex phenomenon as a phase transition seems to find 
its natural theoretical setting, and the bases for its analysis, in the theory of 
(orthodic) ensembles. 
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In a given statistical ensemble macroscopic thermodynamic quantities ap- 
pear either as parameters of the ensemble, e.g. u,v in the microcanonical 
case, 3,v in the canonical and à, 8 in the grand canonical, or as quantities 
directly related to the ensemble partition function such as the entropy, the 
free energy or the pressure in the above three cases, or they are related to 
derivatives of the partition function like the temperature, the pressure and 
the energy in the three cases, respectively. 

It can be shown, see Chap.IV, that the first two types of quantities do 
not depend on the boundary conditions (imagining the latter to be taken 
as fixed particle boundary conditions). Hence a way of searching for phase 
transitions in particular models (i.e. in systems obtained by assuming spe- 
cific choices for the interaction potentials) is to look for parameter values of 
the chosen statistical ensemble (e.g. u,v in the microcanonical case, 3, v in 
the canonical and à, 8 in the grand canonical, 3, p in the pressure ensemble 
case) in correspondence of which the thermodynamic function associated 
with the partition function is not differentiable, [Ru69]. 

This is a method that has become classical: it has, however, the defect of 
not directly providing a microscopic description of the equilibrium states 
describing the different possible phases. It determines the location of the 
phase transition, in the thermodynamic parameter space of the ensemble 
adopted for the analysis: but it does not analyze the characteristic physi- 
cal peculiarities of the possible microscopic distributions that describe the 
various phases. 

On the other hand, the study of the boundary condition dependence of the 
equilibrium states of an “enlarged ensemble” is potentially richer in infor- 
mation and it can lead to a microscopic description of the phase transition 
and phase coexistence phenomena, because each state of thermodynamic 
equilibrium is described in detail by a probability distribution of its micro- 
scopic configurations. The best understanding is obtained by examining in 
detail some simple case (there are not, however, many cases in which the 
above statements can be followed and checked in detail): this will be the 
theme of Chap.VI, where the Ising model for ferromagnetism will be dis- 
cussed in connection with the spontaneous magnetization phase transition, 
and Chap.VII where other simple models are discussed. 

Thus we have met two possible definitions of phase transitions. A system 
shows a phase transition if a derivative of the thermodynamic function asso- 
ciated with the partition function of an orthodic ensemble has a discontinu- 
ity as a function of the parameters describing the elements of the ensemble. 
Alternatively: a phase transition occurs if by changing the boundary condi- 
tions in the elements of an enlarged orthodic ensemble one can change, in 
correspondence of suitable values of the parameters describing the elements 
of the ensemble, bulk properties of the system. 

We conclude by mentioning that if one develops the thermodynamics model 
associated with the pressure ensemble along the above lines one easily checks 


2.5.16 


2.5.17 


2.6.1 
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the equivalence between pressure ensemble and canonical ensemble and one 
finds that the Gibbs potential À = pu + u — Ts is related to the pressure 
ensemble partition function by: 


. 1 


In fact one deduces, from the definition (recalling that V° is the volume of 
the reference container box, see the remark following (2.5.6): 


J(B,p) = / Te PP Zy(B,N) (2.5.17) 


that —+ log Jy (8, p) Wow BA = min, (bpv + Bf.(B,v) so that the mini- 


mum is at v, such that Bp = — Ue (B, Up). Hence we see that p is identified 


with the pressure and, by (2.5.6), GA with the Gibbs potential. 


§2.6. Some Technical Aspects 


Some details concerning the derivation of the mathematical identities used 
in §2.1,82.2 and some related matters will be provided in this section: 
namely we shall derive equations (2.2.4), (2.2.12), (2.2.2). The reader wish- 
ing to delve into the subject can begin by consulting [Fi64], [Mi68], [Ru69], 
[La72], [LL72]. 


(1) It is certainly worth commenting on the step from the last of (2.2.3) to 
(2.2.4). 

In the last of (2.2.3) one can make use of the independence of the integrals 
performed with respect to the variables p from those performed with respect 
to the variables q, and the symmetry of the p-components of the integrand. 

In this way one can replace 2mNv? by mNv? and eliminate the condition 
v > 0; then mNv? can be replaced by Np°/3m, thus taking advantage of 
the symmetry of the P, dependence in the three components of Pi 

Hence one can replace the integral on p, that in (2.2.3) is: 


2 
1 PR Momy? dp, with fer st dp (2.6.1) 
v>0 ~ 3m = 


and a simple calculation shows that: 


—Bp? /2m Bi 1 —Bp? /2m 
Fe dp ee dp, (2.6.2) 
so that: 
pes 1 Î _8E(p) dq, -.. day dp, -dpy 
Q b S Z(B, V) Dr EVN h3N N! 
(2.6.3) 
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and the point g, is, in each addend of (2.6.3) localized in Q (which is 
supposed so small that it has no importance where q i exactly is inside 
Q). 

We now imagine varying V from V to V + dV, increasing the volume by 
displacing, along the outer normal, by 7 every area element of the surface. 
We see that log Z(G, V) varies, since dV = Sn, by: 


dlog Z(8, V) = (2.6.4) 
SS Nsn f 2 BE.) dq, --- da y dp, --- dP y 
Q Z(8,V) Gyr EV h3N N! 


z vizni) anga Ae +++ dan OP, +» dey 
Q AUG V8 q, eV hN N! 


227 In 
which, comparing with (2.6.3) proves (2.2.4). 


(2) Another deduction calling for further details is the step from (2.2.11) to 
(2.2.12). 

By proceeding as in the derivation of (2.6.3) one finds p dV starting from 
the expression for p as the average P(u) with respect to the microcanonical 
distribution with parameters (U,V). Denoting by f* the integral over (p, q) 
extended to the domain of the (p,q) such that E — DE < E(p,q) < E and, 
at the same time, gE dV = UoQ: z 


N *2 p° dp dq 2 i dp dq 
p NU,V) Î 32mRNN! 3N(U,V) l Wyry C65) 
having again used in the last step the symmetry of K (p) in p,,...,p,, (to 


eliminate the factor N) and having written 37, sn: = - dq, to obtain 


S EdV 

a more elegant form (formally eliminating the summations over Q naturally 

appearing, in conformity to its definition, in the expression of the pressure). 
To connect (2.6.5) with the derivatives of M we have to make more explicit 

the dependence of N on U, by evaluating exactly the integral (2.2.9) on the 

p variables in polar coordinates (which is an elementary integral). 

“Tf Q(3N) is the surface of the unit sphere in 3N dimensions and if we set 


w(U, q) = ,/2m(U — ®(q)), we deduce 


dq Q(3N) 
N(U, v= | oA (wu 8" — w(U - DE,@)*") F (2.6.6) 
hence: 
ON dq 3N £ _9\ 2(3.N 
Br = fo BERT 2m (Ug? -uU - DE, o) R 
(2.6.7) 
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and coming back to the original coordinates 


10N _3N1 f dpdg3N-2 1. Ea 2 


NU 2 NJ BNN! 3N KG) 2 -7KO (2.6.8) 


where (K (p)®“) is defined by (2.2.13) and the integral (2.6.8) is extended to 
the domain in which (U — DE < E(p,q) < U). 

If, instead, one had proceeded as in the derivation of (2.2.4) in the canonical 
ensemble case (see (2.6.4) above) one would have found: 


1 TETE dp dq _ 
NV NJ BNN 
e dq 
= - KORAN) 
= — — (2.6.9) 


3 zy D) ed » dp dg 
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where in the last step we multiply and divide by the same quantity and we 
use the notation f* in (2.6.5). Then (2.6.9) and (2.6.5) imply 


1 ON p aV 
Ze a eps e 
N OV 2(K(p))* 


(2.6.10) 


where (K(p)°)* is defined in (2.2.14). 


Relation (2.2.12) now follows from (2.2.11), (2.6.8) and (2.6.10). 


(3) We finally deduce (2.2.2) in the simple case (considered in §2.2) of a 
perfect gas, ® = 0. 

If we imagine dividing the six-dimensional phase space describing the in- 
dividual particles of the system, into cells C having the form: 


C= set of the (p,q) in Rê such that: 


ka dp — 5p/2 < pa < kadp + 6p/2 a=1,2,3 (2.6.11) 


oe 


and k, k’ are two integer components vectors; it follows that the energy of 
a single particle located in C is e(C): 


3 
e(C)= OP. (2.6.12) 


a=1 


Furthermore a microscopic state A of the system can be assigned by giving 
the occupation numbers nc for each cell: they tell us how many particles 
occupy a given cell. Then, without combinatorial or analytical errors (see 
§2.2): 

Z(BV)= Y eP Xoro, (2.6.13) 


nc>0 


eee 
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Taking L = V!/3 to be the side of the container and calculating, instead, 
the expression afflicted by obvious combinatorial errors: 


aans Sa a ec 


need (Ilene!) 

D N 
z4 —Be(C) D 1 LS —Bk? (5p)? /2m So 
=a | Le a dere ke - 

C k=— o0 
3N 
ee DEN —Bk? (5p)? /2m = (2.6.14) 
~ NIN h i = 6. 
3N 

= ve (ap ot en am 

NI han Smp k=—00 


which then leads to (2.2.1) (i.e. to the second of (2.3.1), since y = 0) if 
h © 0 and if one approximates the sum in the last member of (2.6.14) with 
the corresponding integral VE j: e7 Bp /2m dp (committing in this way also 
the above described analytic error). 

To compare (2.6.13) with (2.6.14) or (2.2.1) it is necessary to decide 
whether the values of nc that give the main contribution to (2.6.13) are 
those for which nc = 0,1 (and in this case (2.6.13),(2.6.14) are good ap- 
proximations of each other as well as of (2.2.1) because the factor nc! has 
value 1 in the majority of cases). 

We must therefore compute the average value nc of the quantity nc with 
respect to the canonical distribution and the consistency condition, i.e. the 
condition of negligibility of the combinatorial error will be ne < 1. 

In the canonical ensemble, by definition (2.1.7), the probability of finding 
a particle, with known position, with momentum in dp is the Maxwell- 


Boltzmann law: 
Jae /2m dp 


— =, 2.6.15 
(Vs T)? ee 
hence if p = N/V is the system density we shall find: 
dp)? h’ 
nc = plóg e 82/2” (6p) < a (2.6.16) 


VZrmB 7 VZrmB 


so that nc <1, for all cells C, if T > Ty with T, given by (2.2.2). 

It is clear that the error that we call “analytic error” will be negligible if 
/B/2môp X 1. In the present context we did not fix separately dp and 
ôq: nevertheless ôq should be certainly chosen so that dq > p713 = average 
inter-particles distance, otherwise it would not make sense to think of the 
system as built with particles as separate entities defined in the system. 
With this choice of ôq, from dp ôq = h, it follows that dp = h/p—'/3 and 
one sees that the condition 4/B/2môp « 1 is the same as (2.2.2). 
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§3.1. Equipartition and Other Paradoxes and Applications of Sta- 
tistical Mechanics 


One of the most well-known consequences of classical statistical mechanics 
is the principle of equipartition of energy: somewhat less well known is that 
this principle, after some shining initial successes, reveals itself as the sign of 
statistical mechanics inadequacy to solve important problems that fall into 
its domain. Likewise other well-known important consequences of statistical 
mechanics are affected by serious paradoxes and theoretical problems. 

Here we shall illustrate some significant examples. 


(1) The Free Gases Specific Heats 


By using the canonical ensemble and assuming that the cells size h is very 
small (see §1.2) one easily computes the internal energy for a general model 
in which each particle has £ degrees of freedom and does not interact with 
the others. 

The £ degrees of freedom describe the three baricentric degrees of freedom 
plus the @— 3 internal degrees of freedom of the molecule internal motion. 

One supposes that energy is a quadratic form in the £ conjugated momenta 


P1,P2,---, pe and, possibly, in some of the internal position coordinates: 
TE PB “i 
E(p.q) = DS + IMD to De, 5 Oy OG) (3.1.1) 
j=l j=4 IMS j=to+1 


where p1, p2, D3, q1, q2, q3 are the momentum and position coordinates for the 
particles baricenter, m is their mass while p4,...,q4,... are the momentum 
and position coordinates describing the internal degrees of freedom and 
i (qa, oe <; do): 

Equation (3.1.1) is the form that one expects for the energy of a molecule 
which has a few internal degrees of freedom, precisely {— £o, to which are as- 
sociated oscillatory motions around equilibrium positions (corresponding to 
the values j = lo +1,...,@, with respective proper frequencies 27/w;): they 
can therefore be called oscillatory degrees of freedom. The first lo degrees 
of freedom describe what we shall call translational degrees of freedom; the 
position variable corresponding to a translational degree of freedom is either 
a position coordinate for the center of mass, varying in V, for j = 1, 2,3, or 
an internal angular coordinate, while as a rule the variables q; conjugated 
to a momentum of an oscillatory degree of freedom will always be a variable 
describing an internal degree of freedom and it is best thought of as varying 
in (—00, +00). 

For instance if the gas consists of point atoms with mass m then £ = 3 
and E(p,q) = K(p) = (pj + p3 + p3)/2m. If the gas consists of diatomic 


3.1.2 


3.1.3 


3.1.4 


3.1.5 
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molecules built with two atoms at a fixed distance p, then the kinetic energy 
is: 
1 1 2 2 
Keg = soli (Es) Gao 
where m is the total mass and u the reduced mass, m = m1 + m2, pb = 
mim2/m, and p4, ps are the momenta conjugated to the variables Ÿ and y, 
respectively the latitude and the azimuth of the two linked atoms. In this 
case the variables g conjugated to the first three momenta are real variables 
varying in V while the other two variables are angular variables. There are 
five translational degrees of freedom and no oscillatory degree of freedom. 
For the perfect gases for which the total energy is a sum of the kinetic en- 
ergies of the individual particles (i.e. the fisrt term in (3.1.2)), the partition 
function is: a > 
Z(B,V) = SRN e 8 Din KE) (3.1.3) 
so that the average energy is computed by using the factorization of the 
integrals and by calculating explicitly first the (Gaussian) integrals over the 
p’s and then those over the q of the oscillatory degrees of freedom and finally 
those over the remaining q coordinates (that are trivial if performed after 
those over the p’s). 
More generally if (3.1.1) is the kinetic energy of a single molecule and if 
q are the translational position coordinates and q are the oscillatory ones 
then: 


Í a K(p,.4,)) cP Lins KH) T], dp, dg, 
pe P eRe) TT, dp, dg, 
J ag(f e PX 29 dp ag) 
_ NET J dif? + (@~%)) | 
J aâ 
1 
= NB (Sto +L- bo). 


U = 


= (3.1.4) 


This is an interesting relation because it is independent on the special form 
of (3.1.1) (i.e. independent of the coefficients M; (ĝ, wi, m)): it says that the 
internal energy of a perfect gas is given by the number of degrees of freedom 
times 1/28 = kgT/2 (equipartition of the energy among the various degrees 
of freedom and between kinetic and potential energy) counting twice the 
oscillatory degrees of freedom because the latter contribute to the potential 
energy as well. One also says the “there is equipartition between kinetic 
and internal elastic energy”. 

The constant volume specific heat of a monoatomic gas and that of a gas 
of rigid diatomic molecules are given, respectively, by: 


ðU 3 5 
=, =. = el 
Cy aT SR or zE (3.1.5) 
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where n = N/N\1, with N4 = Avogadro’s number, n the number of moles 
of gas and R = kg N4 is the gas constant. 

Equation (3.1.5) agrees well with the experimental results on rarefied mono- 
atomic gases, the agreement is less good for the diatomic gases, even if 
rarefied. 

In fact (3.1.5) cannot be accepted on general grounds, not even for 
monoatomic gases (even if rarefied), because it is known that gases con- 
sist of atoms with very many degrees of freedom, mostly oscillatory, and 
their specific heat is, nevertheless 3nR. For instance neon could be thought 
of as built with 20 protons and neutrons and 10 electrons, i.e. it would have 
90 degrees of freedom, of which 87 are oscillatory(!). 

But even the simple case of a truly diatomic molecule in which all internal 
degrees of freedom are neglected except the three describing the relative 
position of the two atoms is conceptually unclear: if one assumed rigidity of 
the distance between the two atoms then the specific heat would be 3nR; 
if, instead, we admitted that the distance between the two atoms oscillates 
around an equilibrium position (which is more “realistic” ), then the specific 
heat would be InR because the degrees of freedom would be 6, one of which 
oscillatory. 

It appears, therefore, that “things go as if” some of the internal degrees 
of freedom were less important than others, they are “frozen” and do not 
contribute to the energy equipartition. Which, therefore, would not be valid 
in general, in spite of it being an extremely simple consequence of the theory 
of the canonical ensemble. 


(II) The Specific Heat of Solids. 


Another success-failure of classical statistical mechanics is the theory of the 
specific heat in crystalline solids. A crystalline solid can be modeled as a 
system of particles oscillating elastically around ideal equilibrium positions 
arranged on a regular lattice, e.g. a square lattice with mesh a (to fix ideas). 

It is known from the elementary theory of oscillations that such a system 
is described in suitable normal coordinates by the Hamiltonian: 


H = 2 (py + w(K)? aK) (p,.4,) € R° (3.1.6) 


where the sum runs over the triples k = (k1, k2, k3) of integers with k; = 
0,1,..., YN—1 if N is the number of atoms of the crystal (which we assume 
cubic and with side /N = L), and: 


3 
w(k)? = 2c2 5 (1 — cos h 2r) (3.1.7) 


i=1 


with c being the sound propagation velocity in the crystal. 
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If we could compute the system properties by using the canonical ensemble 
then the internal energy could be computed as: 


3 J Ex W; + olk) a) e TEL dpdg ey das 
f eE LD dp dq 26 k 


because the sum over k concerns 3N values and the calculation proceeds 
as in the case of the discussion of energy equipartition; with the difference 
that now all the 3N degrees of freedom are oscillatory. 

Therefore the specific heat of a crystal should be: 


C = 3Nkpg = 3nR (3.1.9) 


if n is the number of moles, and this is quite well satisfied at high temper- 
atures (above the solidification temperature, but below liquefaction) and is 
known as the law of Dulong-Petit. 

If however one takes into account that a typical model of a conducting 
solid consists of N ions on a lattice and N electrons forming a free electron 
gas, then one finds that instead one should perhaps expect a specific heat 
of 3nR + 228. 

Experiments show that the specific heat of crystals at high temperature 
indeed conforms to the Dulong-Petit law. At lower temperatures instead 
the specific heat approaches 0, according to a general principle called the 
third law of Thermodynamics. 

Hence classical mechanics produces erroneous predictions also for a crys- 
talline solid: it looks as if some degrees of freedom are frozen because they 
do not contribute to the specific heat (in other words their contribution to 
the internal energy is the same as that which they would give if their tem- 
perature could be considered zero and staying constantly so in all the system 
transformations: which is not possible because then the system would not 
be in thermal equilibrium). Furthermore at lower temperatures the crystal 
oscillations seem to become less and less describable by classical statistical 
mechanics because the specific heat deviates from the Dulong-Petit law, and 
tends to 0. 


(III) The Black Body. 


A thermodynamic theory of radiation can also be developed on the basis 
of the theory of ensembles, and one reaches disturbing and upsetting con- 
tradictions with the experimental observations as a consequence of classical 
statistical mechanics developed in the previous chapters. 

In fact it was in the theory of the black body where, historically, the 
contradictions were felt most and led to the origin of quantum mechanics. 

Consider a cubic region V filled with electromagnetic radiation in ther- 
mal equilibrium with the surrounding walls with which it is supposed to 
exchange heat. We describe the electromagnetic field by the vector poten- 
tial A and the relations (which are implied by Maxwell’s equations in the 
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vacuum): 


104 H = rot À, div A=0 (3.1.10) 
c Ot 

where c is the velocity of light, c = 2.99 x 101° em sec~+. It is well-known 
that the motion of such a field is described by the Lagrangian: 

1 


LS Z | E-E) (3.1.11) 


E=- 


regarded as a function of À, À. 
If L is the side of the volume V occupied by the radiation, which for 
simplicity it is convenient to consider with periodic boundary conditions 
(by identifying the opposite sides of V), it will be possible to write A in 
terms of its Fourier expansion: 


A(x) = = LLACH et (3.1.12) 


where k = Tv and vy is an integer component vector, and elo) (k) are two 
polarization vectors, with unit length, and orthogonal to k and to each other. 
One finds 


2 
1 1 1 
= = — AM (k)? — — k AM) (k)? hal 
CEE (get we 7400 (3.1.13) 


Therefore the evolution in time of the field in the cavity can be described 
by the Hamiltonian function: 


E (3.1.14) 


where the pairs (p(k), q®(k)) or (P (k), FO (k)) = (Vrp (k), 
qd (k) /V 4r) are canonically conjugated coordinates, equivalent because 
the transformation (p,q) = (D, 7) is canonical. 

Hence an electromagnetic field in a cavity Vcan be regarded as a system 
of infinitely many independent harmonic oscillators. 

It is, therefore, very tempting to describe this system by statistical mechan- 
ics and to say that at temperature T the microscopic states of the system 
will be distributed according to a canonical distribution and, hence, the 
probability of finding the oscillator with labels (a, k), i.e. with polarization 
a and wave vector k, in the cell C = dp‘ (k)dq® (k) is: 


e- FO? +2 (pe) PO (k)dg® (k) (3.1.15) 


lAn26-2k 7e-2 f 
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It is clear that by assuming (3.1.15) one assumes that the cell size is negligi- 
ble: this usually introduces the two types of errors that have been discussed 
in §2.1 and §2.4. In the present case the combinatorial error is absent be- 
cause this time the oscillators are pairwise distinct. However if B is large 
the error due to having neglected the cell size by considering p(k) and 
q(® (k) as continuous variables is still present and it might be substantially 
affecting the results. 

If one accepts (3.1.15) the average energy per oscillator will be kgT, by 
the equipartition argument in (I) above, because each oscillator represents 
an oscillatory degree of freedom (see (3.1.14)). 

Therefore it follows that, if v = |k|c/2r is the frequency of the wave with 
wave number k, the quantity of energy Lu, dv corresponding to the oscilla- 
tors with frequency between v and v+dv is related to the number of integer 
vectors n such that v < [nlc/L < v + dv via: 


1 
Lu,dv = 2: (number of |n| such that |nlc/L € (v, v + dv)) = 


5 
E ee du = I3 8T 2d CEA 
tel re = 55 v“ dv 


where in the first step the factor 2 after 87! is there because, for each 
k there are two oscillators with different polarizations and equal average 
energy, kBT. Hence the Rayleigh-Jeans’ formula emerges: 


8rv? 
Up = 
P 


keT (3.1.17) 


which is manifestly in disagreement with experience, because te Up dv = œ 
and a radiating cavity, in thermal equilibrium, would have infinite energy. 

Experimentally the distribution (3.1.17) is observed only if v is small, and 
for large v the observations are in contrast with the energy equipartition 
theorem because one finds that u, approaches 0 very quickly as v tends to 
infinity. 

We see that classical statistical mechanics in the above three applications 
leads to paradoxes and wrong predictions. In the next section we shall see 
that the paradoxes disappear if the constant h measuring the cells size is 
h # 0: and it will be possible to say that all contradictions that appear 
in classical statistical mechanics arise when, to simplify the formulae by 
replacing summations with integrals, errors of an analytic and combinatorial 
nature are introduced, see also 82.1, 82.2 and 82.6, by taking h = 0. 


§3.2. Classical Statistical Mechanics when Cell Sizes Are Not 
Negligible 


In §3.1 and in the previous chapters we always neglected the size h of 
the phase space cells representing the microscopic states of the system. As 
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pointed out repeatedly, important errors are introduced in so doing (see §2.1, 
§2.4, which we shall see are ultimately intimately related to the paradoxes 
discussed in the previous sections. 

The main error, however, is due to the fact that if the cells can no longer 
be thought of as points then one should simply not use classical statistical 
mechanics. The previous section shows that the theory leads to results 
that in turn permit us to test whether the cells in phase space can really 
be regarded as points: the disagreement between theory and experimental 
results implies that the cell size is actually accessible to experiment and 
this must necessarily lead us to reformulate the very principles of classical 
mechanics and therefore of statistical mechanics. 

To realize how drastic could be the changes of the Thermodynamics of 
a system in a “quantum regime” in which h cannot be neglected one can 
just proceed by assuming as valid the description of the system in terms 
of cells in phase space and evaluate more accurately the partition sums of 
the various ensembles, avoiding committing the combinatorial and analytic 
errors that we have described above and that are really negligible only in 
the limit as h — 0. 

Consider as a first example a free gas of identical particles with no internal 
degrees of freedom; and let C be a generic cell of the six dimensional phase 
space in which the states of the single particles can be described: let the 
volume of C be (5p ôq)? = h?. 

Since the identical particles are indistinguishable then the microscopic con- 
figurations A are determined by the numbers ng of particles that, in the 
configuration A, occupy the cell C. Then 


E(A) = Ve nce(C) total energy 


3.2.1 
N(A) = Ð onc number of particles ( ) 


where e(C) is the energy of a particle in the cell C. 
Let us study the system in the grand canonical ensemble, where the calcu- 
lations are somewhat simpler. The partition function is then: 


E(8,3) = X e> Eere. Xe nce(C) (3.2.2) 
{nc} 
where, for each C, no = 0,1,2,3,...: see 82.5, (2.5.4). 
We perform the summations explicitly, thus avoiding the combinatorial 
and analytical errors whose effects we are investigating. We find 


= 1 a log(1—e7 (8+ 8e(©)) ) 
C 


and the probability that nc = n can be immediately computed, see (2.5.3), 


e` BAn—Bne(C) 


p(n; C) mn (1 = e-BA-Be(0))-1 


(3.2.4) 


3.2.9 
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The equation of state is deduced by expressing À as a function of the density 
p and of 8 via 


—BrAn—Bne(C) 


p=p 7 ZE nrn0)- 72 TAG (3.2.5) 


and then by replacing À with A(8, p) in the grand canonical expression of 
the pressure. By recalling that in the grand canonical ensemble the pressure 
is directly related to the partition function, see (2.5.12), we get 


B px, B) = = log = —— 7 $ losl - gm PAREL) (3.2.6) 


and the total energy per unit volume u; is 


1 e~BAte(C))n 
p) aye) Ge BOAO) = 


1 e-BA+e(C)) 
KV D e(C) Foro 


(3.2.7) 


To appreciate the difference between (3.2.5)-(3.2.7) and the classical perfect 
gas properties it is convenient to imagine that e(C) = p? /2m if C is a cell 


with center at the point (p, q) and, hence, to neglect the variability of p? /2m 
in C. 
The latter approximation implies 


3 


d°p 2 
Bp(X 8) = — 1 ae eS AU) 
dép ePOte/2m 1 
pQA,8) = | =F a. TS (3.2.8) 
dép e7 8At? /2m) 
ui (A, 8) -[ Ts = PSC ED) 
Integrating the first of (3.2.8) by parts one gets the relation: 
2 
Bp(A, b) = = bu (3.2.9) 


The neglect of the variability of p?/2m in C introduces an error; however one 
can check, without difficulty, that it does not alter the qualitative properties 
of (3.2.5)-(3.2.7) which we shall discuss shortly (the approximation only 
simplifies the analysis, to some extent). 

The most relevant phenomenon is the Bose condensation: the (3.2.4) show 
that the parameter À must be such that —A > ming e(C) = 0. Hence, as 
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appears from (3.2.8), the maximum density po(8) of the system seems to 
correspond to A = 0: 


dn eB /2m 
po(B) =| FF Ge EER (3.2.10) 


which looks incorrect because the density can be prescribed a priori, by 
assigning the number of particles, hence it cannot be bounded above. 

But the density can be larger than po(B) because (3.2.4) does not have 
sense if À < 0, e(C) = 0. Interpreted literally (3.2.4), for À > 0, shows that 
the particle number in a cell C with e(C) = 0 is 


Drao ne >n d Se. 
So ana aes ODL ve 
seem ABN >, 


d 
= TA log(1 — e7°>^) = (3.2.11) 


ee 
Sr ee Oo? 
À = e—BA 1x—0+ 


The correct interpretation of (3.2.10) and of the last remark is that the 
cells with e(C) > 0 can contribute the quantity p0(B) to the density p, at 
most: however the remaining part of a larger density, p — po(B), is due, if 
p > po(B), to the particles that are in the cells C with e(C) = 0! Note that 
there are many such cells because they must only have 0 momentum but 
the spatial centers of the cells can be anywhere in the container V. 

This in fact means that the most appropriate way to describe the states of 
this system should be the canonical ensemble. But from the above discussion 
we can imagine describing a state with density p > po(B) in the grand 
canonical ensemble by setting À = 0 and then by imagining that (p—o(8))V 
particles are in the cells C with e(C) = 0. 

It is important to remark that since po(B) — 0 for 8 — oo the phenomenon 
of Bose condensation is always important at low temperature if the total 
density is kept fixed. And it is clear that the particles that are in the cells C 
with e(C) = 0 have zero momentum and therefore they do not contribute to 
the internal energy nor to the pressure nor to the specific heat at constant 
volume. 

In particular if we wish to examine the specific heat at constant volume 
when T — 0 we can note that, as soon as T is so small that po(G) < p the 
internal energy becomes 


d°p p? e BR? /2m 


= = Se 5/2 
UVES aaa pme LE | (3.2.12) 
(only cells with e(C) Æ 0 contribute) with 
3 2 
o= f $2.2 2 mp2. (3.2.13) 
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Hence 
Ou: 


oT 
which shows how in the perfect gas that we are studying the equipartition 
result Cy = 3n R/2 is no longer true: instead one finds C, — 0 for T — 0!. 
At low temperatures equipartition fails if one takes h Æ 0 seriously. 
Another example in which h cannot be neglected is the case in which the gas 
particles are imagined to interact in a very simple way, conceivable although 
not usual in classical mechanics: suppose that the particles “repel” each 
other in the sense that they cannot occupy the same cell in phase space 
so that one cannot find two or more particles in a given cell. The unusual 
nature of this force is expressed by its dependence on velocity (because it 
generates a “hard core” in phase space). 

In the latter case the partition function is (3.2.2) with the condition that 
no = 0,1. Hence: 


Cy = = constant T?/? if p > po(8) (3.2.14) 


50,6) =[[a te Fete”) (3.2.15) 
C 


and the probability that nc = n is, instead of (3.2.4), 


e-B(Ate(O))n 


PO) = IIe POr 


n=0,1 (3.2.16) 


and (3.2.5)-(3.2.8) change accordingly. 

This gas does not resemble at all the classical perfect gas and at low tem- 
perature it exhibits the phenomenon of Fermi condensation; one sees in fact 
that 


p(n; C) =——> (3.2.17) 


B— oo 


1 ife(C) < —A 

0 ife(C) > —À 
so that at low temperature only the cells with p?/2m < —À are occupied: 
their momenta fill a sphere in momentum space (Fermi sphere). Note that 
if À > 0 the system density tends to 0 as T — 0. If one wants to keep a 
constant density while T — 0 one must fix À < 0. In fact if À < 0 the 
density is such that: 


d'p e7PO+tp/2m) AT 3 
p(, 8) = E Ipe POTET Bae gps V TMA : (3.2.18) 


Hence if 8 — oo and the density stays constant (i.e. À < 0) one finds the 
internal energy and the specific heat at constant volume via the relations: 


2 d —B(A+p? /2m) 
pzy] 2 SE ee 
2m h? 11e POP Fam 


: eg C2) 
_ (aU p d°p à (A + p?/2m)e~? p /2m 
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and an elementary analysis of the integrals leads to the asymptotic formula 
Cy = oVT as T — 0 (3.2.20) 


with a suitable o. Hence also this system behaves in a different way if 
compared to the classical perfect gas at low temperature. In particular 
(3.2.20) shows that equipartition of energy no longer holds (because Cy 4 
V3nR/2). 

The conditions under which a behavior emerges reflecting the fact that h 
can no longer be considered negligible, and the classical perfect gas shows 
properties that are completely different from those just exhibited (without 
neglecting the size of h and, of course, only for small T) have been discussed 
in Chap.II, (2.2.2) and §2.6. We just recall that we obtained an estimate 
of the value of the temperature below which the effects of the nonvanishing 
cell size begin to be felt as: 


Ty = h?/(mkpp 7/3). (3.2.21) 


One can check, on the above formulae, that the latter value for T, coincides, 
as we should expect, with the value of the temperature such that po(G,) = p 
in the first case and such that —A@ © 1 in the second. 

It is common to say that the condition T > Ty is the condition that the per- 
fect gas does not present degeneration phenomena due to the nonnegligible 
size of h. 

It is not difficult to realize that the degeneration due to the fact that h 
is appreciably 4 0 can be the mechanism that permits us to avoid all the 
paradoxes due to energy equipartition. 

For instance in the theory of a crystal, the electron contribution to the spe- 
cific heat is negligible because the value of the temperature below which the 
electron gas presents degeneration phenomena (with consequent smallness 
of the specific heat, see (3.2.14) or (3.2.20)) can be estimated on the basis 
of (3.2.21) and it gives a very high value of Ty. 

By using (3.2.21) and m = 0.91 x 10-27 g, p = 102? cm~? (density of the 
free electrons in iron) one finds T;: 


Ty = 1/kpBq = 1.6 x 10°°K. (3.2.22) 


More generally one can think that, if a given system consists of various 
particles, each with several internal degrees of freedom, then at a given 
temperature only some degrees of freedom are nondegenerate: equipartition 
then takes place “between” them, while the other particles remain in a de- 
generate state and therefore produce novel phenomena, among which the 
lack of contributions to the specific heat. 

A very interesting example is that of black body radiation theory: in fact 
the black body is a system with infinitely many independent degrees of 
freedom, most of which are in a state of extreme degeneracy (see below), 
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so that the equipartition of the energy takes place only between a finite 
number of degrees of freedom. 

In §3.1 we saw that a radiating cavity can be regarded as a set of infinitely 
many harmonic oscillators with Hamiltonian (3.1.14): 


H= iE EPOE? + 2x? TOK?) (3.2.23) 
k 


a=1 


where p(k) and 7“) (k) are canonical variables. 

The canonical distribution attributes to the configurations in which the 
oscillator with polarization œ and wave number &k is in the cell owe with 
center (P (k), a (k)) = (m ôp,n ôq), (m,n integers), the probability: 


ak e- § (m2 dp? +c? k?5q?n?) 
P(Cmin) = a et wag) (3.2.24) 
Damne”? 


where we do not neglect the dimensions of CRE, and we take seriously the 
canonical ensemble, forgetting that its use is doubtful when the cell sizes 
are not negligible and that in such cases statistical mechanics should be 
completely reformulated. 

By repeating the analysis followed to obtain (3.2.21), see 82.2 and 82.6, one 
easily finds the condition under which the size of h is negligible: 


/ Bop <1, JB clkléq <1. (3.2.25) 


Without explicitly fixing the values of dp and dq we see that (3.2.25) will 
imply, in particular (multiplying corresponding sides of the two conditions) 
that 8 is too large for a classical statistical description of the oscillators 
with frequency v if 


B c\k] 6p dq = Bhclk| = 27Bhv > 1 (3.2.26) 


where v = clk|/2r. 

We must therefore expect that, given h, the high-frequency oscillators (with 
|k| > 1/hc8 or hv > p7! = kgT) will be degenerate, i.e. they cannot be 
described without taking into account the cells sizes. Note that since v 
can be as large as we want there will always be frequencies v for which 
hv > 67! = kpgT. 

If we take h = 6.62 x 107?" erg: sec and T = 6000°K (temperature at the 
surface of the Sun) one sees that the degenerate frequencies are all those 
greater than: 

vo = 1/hG = 1.25 x 104 cycles sec! (3.2.27) 


which can be compared, for the purpose of an example, with the fre- 
quency of green light (where the Sun spectrum has its maximum) Vgreen = 
0.6 1015 cycles sec”. 
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The latter numerical values explain why the degeneration phenomenon has 
been so “easy” to observe, or “so conspicuous”, in the black body radia- 
tion and why it has plaid such a big role in the development of quantum 
mechanics. 

The average energy of a nondegenerate oscillator is, by energy equipartition, 
kgT = 1/8, while if we do not neglect the possibility of degeneration this 
energy is: 


u(a,k) = 5 5 (nop? + P|k| mgp CZE )) (3.2.28) 


m,n 


as expressed by (3.2.24). 
The quantity of energy in the radiation with frequency between v and 
v + dv is then (see (3.1.16)): 


Anv? : 
Lu, dv = = dvL® X` ulk, a) (3.2.29) 
a=1 
where |k| = 2mv/c. fv « an equipartition holds as one can compute 


explicitly using (3.2.28), (3.2.24); and (3.2.29) is simply: 


8rv? 


w=- pE: (3.2.30) 


To discuss the high-frequency case v >> 1/8h it is necessary to fix dp and 
ôq: but in classical mechanics one cannot give a clear criterion for choosing 
dp or åq. Hence for concreteness we shall choose dp and ôq so that: 


ôpôg=h a dp = 0V2rvh (3.2.31) 
dp = %?clk|ôq = 0? 2rv0q ôq = 0971 Vh/2rv 


with Ÿ ~ 1. Although this is a “natural” choice because it makes approx- 
imately equal the two addends in (3.2.28) for m = n = 1 (exactly equal 
if Ÿ = 1), it is nevertheless arbitrary. The results are qualitatively inde- 
pendent of the choice of J, but their quantitative aspects do depend on its 
value. 

From (3.2.29), (3.2.31), one deduces with a brief analysis of the series on 
m and n, that if Bhv > 1 and J? = min(V?, 9-7), 


u(k, a) = hve h=2r 8h if0A1 


2 , (3.2.32) 
u(k, a) = 2hve f” h=2rh ifÿ=1 


so that (3.2.30) yields (for V Æ 1 and h = 27h19?) the distribution, identical 
to the Wien’s distribution, 


8rv? ~ 


hve bby (3.2.33) 
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which shows that the energy present at high frequency is far below the 
equipartition value and, in fact, the total energy of the electro-magnetic 
field in thermal equilibrium is finite, unlike what would happen if every 
oscillator had the same average energy. Of course (3.2.33) cannot really 
be taken seriously because, as repeatedly remarked already, the very fact 
that we do not neglect the size of h shows that it would be necessary to 
reinvestigate the basic laws of motion, based on (3.2.23), that we are using. 
A further indication that (3.2.33) cannot be considered a correct distri- 
bution is seen also by noting that by changing by a small amount the cell 
shape (e.g. take V = 1 or V Æ 1 in (3.2.32)) one would find a quantitatively 
different result. 

For instance Planck used phase space cells Ch E (for single particles) with 
the shape of an elliptic annulus defined by: 


1 
(n — 1)hv < 5 (P(E)? + 2 |k|q’ (k)?) < nhv ninteger >0 (3.2.34) 


and area h; i.e. he imagined that the cells were defined by the value of the 
energy (and more precisely of the action to which, in this case, the energy 
is proportional) rather than by the momentum and position. Note that this 
shape is “very” different from the parallelepipedal shapes used so far. 

In this way (3.2.24) and (3.2.27) are replaced by: 


—Bhvn 
aky e 
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ulk, r) = D (1 e—Bhv)—1 = 1 — e—Bhv 
which leads to the Planck distribution: 
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for the black body radiation. 

Obviously on the basis of classical statistical mechanics it is impossible to 
decide which is the correct radiation distribution: we can only say that if 
in fact phase space cells cannot be chosen smaller than a minimum size, 
then it will be impossible to accept equipartition and, on the contrary, the 
high-frequency oscillators will have a very low average energy. 

The experimental result that radiation in thermal equilibrium conforms to 
the Planck distribution is an indication of the non-indefinite divisibility of 
phase space. And the black body is a system particularly apt to reveal the 
discrete structure of phase space, because it consists of an infinite number 
of oscillators with frequency v greater than an arbitrarily pre-fixed value 
vo, and therefore it contains an infinite number of degenerate oscillators if h 
is positive, no matter how small. In fact degeneracy happens to be already 
visible in “everyday life”. 
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It is also possible, and of major interest, to investigate which would be the 
predictions of a strict intepretation of radiation and specific heat theories in 
terms of classical mechanics, i.e. assuming h = 0 in spite of our arguments 
in Chap.I and above, on the “unphysical nature” of such an assumption. 

This is however a very difficult task and many open problems remain. 
Therefore I can only quote here a few papers that after the work [FPU55] 
have tried attacking the problem and brought a wealth of new ideas and 
results on the behavior of large assemblies of purely classical point particles 
in a situation in which the temperature is lower than the value (3.2.21) 
where problems with a classical interpretation begin to appear. See [GS72], 
[BGG84], and for the more recent developments see [BGG93], [Be94], [Be97]. 


§3.3. Introduction to Quantum Statistical Mechanics 


In a sense quantum statistical mechanics is very similar to classical sta- 
tistical mechanics: this should come as no surprise as both theories aim at 
explaining the same macroscopic phenomena. 

As we have seen in §3.2 some of the main phenomena that receive their ex- 
planation in the framework of quantum mechanics (like the low-temperature 
specific heat of solids or the black body radiation or the perfect gases spe- 
cific heats) can in a very qualitative and empirical sense be guessed also in 
classical statistical mechanics: and historically this actually happened, and 
sparked the genesis of quantum mechanics. 

Phase space no longer has a meaning and one only thinks of observable 
quantities: which are described mathematically by some (few, not necessar- 
ily all, which would lead to conceptual problems, [VN55], [BH93], [Be87]) 
linear operators on a Hilbert space, usually infinite dimensional. On this 
point a long discussion could be started by arguing that this is in fact not 
really necessary and the dimension might be chosen finite and its size would 
then become a parameter that would play in quantum statistical mechanics 
a role similar to the phase space cells size h in classical mechanics. 

However, since no “crisis” is in sight which would lead to a new mechanics, 
at least no crisis that is as obvious and as universally recognized as a problem 
like the black body radiation laws were in the early days of the twentieth 
century, we shall not dwell on the exercise of trying to understand how 
much the theory depends on the dimension of the Hilbert space or, for that 
matter, on the continuity of the space of the positions that particles can 
occupy (which one may, also, wish to challenge). 

statistical ensembles are defined in terms of the Schrödinger operator de- 
scribing the observable energy and usually denoted H. But their elements p, 
rather than as probability distributions of phase space, are defined as rules 
to compute the equilibrium averages of observables (which is essentially 
what they are used for also in classical statistical mechanics). 

A ensemble € will, then, be a collection of rules p each of which allows 
us to compute the average value that an observable has in the macroscopic 
equilibrium state p € E: the element p should be stationary with respect to 
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quantum mechanical time evolution, as in the corresponding classical cases. 
The mathematical notion necessary to define a “rule to evaluate averages” 
of observables represented by self-adjoint operators, and therefore the analog 
of the classical probability distributions on phase space, is that of density 
matrix. If A is an observable and H is the energy operator that corresponds 
to N particles in a container V, one defines the canonical ensemble as the 
collection of all the density matrices which have the form: 


p = const e PH (3.3.1) 


and the average value of the observable A in the macroscopic state repre- 
sented by (3.3.1), parameterized by 8 and V, as in the analogous case of 
the classical statistical mechanics, is defined by 


— TrAe PE 


where Tr is the trace operation. 

As hinted above, for all practical purposes (most) operators can be regarded 
as big auto-adjoint matrices of large but finite dimensions, so that the trace 
makes sense: after some practice one in fact understands how to avoid 
annoying errors and pitfalls linked to this view of the operators, much in the 
same way in which one learns how to avoid differentiating non differentiable 
functions in classical mechanics. 

Thermodynamics models are deduced from the (quantum) canonical par- 
tition function: 


Z(B,V)=Tre EF (3.3.3) 


and now 1/6kp is interpreted as the temperature, while the free energy is 
defined by f(@,v) = Jim, —B-' + log Z(6, V) in the limit V — 00, V/N > 
v (thermodynamic limit). 

Note the the absolute temperature is no longer defined as proportional 
to the average kinetic energy: rather it is identified as proportional to the 
parameter 8! that appears in (3.3.2): see also 82.1, and 83.1 for a related 
comment on this difference. In some respects this is the really major novelty 
in quantum statistical mechanics. 

One can also define quantum microcanonical or grand canonical ensembles 
and check their equivalence, sometimes even rigorously under suitable extra 
assumptions like stability and temperedness (see (2.2.17), (2.2.18)), [Ru69]. 

For instance, considering N identical particles of mass m in a cubic con- 
tainer V, the Hilbert space is the space L3(V‘) of symmetric (or an- 
tisymmetric) square integrable functions of the N position coordinates 
CR pir dy) and the energy operator is 


i 
H=— > >, Ag, + (9) (3.3.4) 
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where ®(q) = Dies (q; — q;) is the potential energy of the interaction, 
h = h/2rn if h is Planck’s constant and A, is the Laplace operator with 
respect to the i-th particle coordinate and with suitable boundary conditions 
(e.g. periodic or Dirichlet boundary conditions). 

The symmetry or antisymmetry of the wave functions is imposed to take 
into account the specific quantum nature of the particles which can be either 
bosons or fermions, the latter corresponding to a system of particles which, 
besides the interaction energy ® in (3.3.4), also have the extra interaction 
(classically nonstandard but quantum mechanically very natural) that no 
two particles with the same momentum can occupy the same position. 

The stability notion for the interaction ® is important in quantum statisti- 
cal mechanics as much as it is important for classical statistical mechanics. 
An interaction is called quantum mechanically stable if there is a constant 
B such that the Schrödinger operator H, (3.3.4), for N identical particles 
satisfies, for all N > 0, 

H > -BN (3.3.5) 


where the inequality holds in the sense of the operators (i.e. for any nor- 
malized quantum state |Y), (Y| H W4) > -BN). 

It is interesting and important to note that the inequality (3.3.5) can now 
be valid even if the infimum inf ®(q) of equals —oo because the potential 
becomes — at 0 distance. In fact one can no longer separate the potential 
and kinetic energy as independent quantities: the indetermination principle 
in fact forbids concentrating too many particles in too small a box without 
giving to them a high kinetic energy. Hence there is the possibility that the 
decrease in potential energy due to too many close particles (contributing 
a large negative potential energy if y(Q) < 0 or y(Q) = —co) is compen- 
sated by the increase of kinetic energy necessary to achieve the confinement. 
Whether this really happens or not depends on the system (mainly on the 
bosonic and fermionic nature of the particles) and has to be quantitatively 
checked. It will be briefly discussed in Chap.IV. 

The case in which the system contains several species of identical particles is 
treated as easily as in classical statistical mechanics. In the latter case it was 
sufficient to introduce suitable combinatorial coefficients to take the identity 
of the particles into account, see (2.2.19),(2.2.20); in the quantum case one 
shall simply require the symmetry or antisymmetry of the wave functions 
with respect to the permutations of the positions of identical particles. 

For instance a system of N; electrically charged particles with charge +e 
and of Nə particles with charge —e interacting with the Coulomb force would 
have, in the classical canonical ensemble, the partition function 
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where @ is a potential describing a nonelectric force between the particles 
and m,,m_ are the masses of the two species, 

In the quantum case one has instead: 7(3,V) = Tr exp —GH, where H is 
the Schrödinger operator: 


h? Na h2 No 7 
= EE = g A ® . . 


considered as an operator acting on the space of functions Fao An 
= =N1 


dyp IN 4 Na) symmetric or antisymmetric with respect to the permu- 
tations of the first N1 variables or of the second N2, but with no symmetry 
property with respect to “mixed” permutations. 

The statistics, as one often calls the symmetry properties of the wave 
functions! with respect to the permutations of their argument plays an 
essential role in the theory. From the classical viewpoint adopted in §3.1 
above we already had a glimpse of the phenomena that may make quantum 
statistical mechanics quite “strange” even from a qualitative point of view, 
at least at low temperatures. The reason is that the statistics can be in- 
terpreted as a special (simple) further interaction (i.e. no interaction in the 
case of bosons and a repulsive interactions in phase space for the fermions): 
compare (3.2.14) and (3.2.20). 

But the statistics may play a role even at ordinary temperatures: for 
instance electrically neutral systems in which particles interact only via 
Coulomb forces are unstable in classical statistical mechanics, at all temper- 
atures, for the trivial reason that the Coulomb potential between particles 
of opposite charge is unbounded below near the origin. But in quantum 
statistical mechanics they are stable if the charged particles satisfy Fermi 
statistics or if the bosons have charges of only one sign. See Chap.IV for 
a discussion of the importance of stability in statistical mechanics even in 
systems in which no charged particles are present. 


83.4. Philosophical Outlook on the Foundations of Statistical Me- 
chanics 


Contemporary (i.e. AD2000) equilibrium statistical mechanics can be said 
to be in an ideal conceptual stage of development. 


1 Which form the space on which the Hamiltonian acts as an operator 
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(1) There seem to be no fundamental theoretical problems after the irre- 
versibility of macroscopic evolution has been shown to be compatible with 
the reversibility of microscopic dynamics: this understanding already came 
about at Boltzmann’s time in terms of the existence of time scales of very 
different orders of magnitude over which irreversibility and reversibility can 
manifest themselves. It was put in a rigorous mathematical form by Lan- 
ford, see 81.8: and this should have pacified the stubborn nonbelievers in 
the incompatibility between microscopic reversibility and macroscopic irre- 
versibility (it did so only very partially, in fact!). 


(2) The paradoxes to which classical statistical mechanics leads have been 
understood in terms of quantum effects and the conditions of applicability 
of classical statistical mechanics have been correspondingly precisely formu- 
lated, see (3.2.21),(1.2.16), (2.2.2), and 82.6. 

There are still many questions to be understood on the dynamics of the 
approach to equilibrium in many-particles systems and to develop reliable 
(and universally recognized as correct) methods to evaluate the time scales 
relevant in the phenomena of approach to equilibrium, see Chap.I, Chap.IX. 

The ergodic problem is still not well understood particularly in systems 
close to mechanical equilibrium positions (as in oscillations in crystals) 
where it might even be conceivable that the ergodic hypothesis really fails 
in a substantial way, [FPU55]. This is so in spite of the major success 
achieved by Sinai in proving the ergodicity of a really interesting physical 
system (two balls in a periodic box, [Si70]) and its extension to many balls 
in a box [KSiS95]. On the whole the scarce understanding of nonequilibrium 
phenomena is reflected also in major problems in the kinetic theories of gases 
and liquids and of the related transport phenomena, [Co69],[Co93],[Do98]. 

The importance of the latter question has been strongly stressed by L. 
Galgani and by his collaborators who have devoted to the subject several 
important studies which led to a much better understanding of the relevance 
of the (probable) lack of ergodicity on time scales as long as the life of the 
Universe. The investigations stem from, and develop, “forgotten” remarks 
of “founding fathers” like Jeans, [GS72], [BGG93], [Be97]. 

But open problems abound also in equilibrium statistical mechanics. 

The central problem of equilibrium statistical mechanics is perhaps the 
theory of phase transitions and of the corresponding critical points. There 
is no evidence of fundamental difficulties and recently some clarification 
has been achieved in the phase transition phenomenon as a phenomenon of 
instability with respect to boundary conditions, or of sensitive dependence of 
the equilibrium state on the boundary conditions, see Chap.IT and Chap.V. 

Via simple soluble models, see Chap.VII, it has been shown how even the 
simplest models of mechanical systems (like systems of magnetic spins on 
a lattice) can show nontrivial phase transitions and, in fact, very interest- 
ing ones. Nevertheless very important phenomena, such as the liquid-gas 
transition or the crystal-liquid transition, are not really understood. 

In fact there is no model that could be treated avoiding approximations 
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which are really out of control and which describes one such transition. 
By “out of control” I mean approximations that have to be conceptually 
regarded as parts of the model itself because their influence on the results 
cannot be estimated “without hand waving”. See, however, [Jo95] and the 
very recent [MLP98] for very encouraging steps in the right direction. 

A meager consolation comes from the reassuring confirmation of the theo- 
retical possibility (i.e. consistency) of such transitions accompanied by the 
development of many approximate theories that are continuously generated 
(and used in concrete applications: the ultimate goal for a wide class of 
scientists). 

The first among such theories is the mean field theory which until the 
1930s was the only available theory for the study of phase transitions. This 
is a simple theory, see Chap.V, but somewhat too rough (so as to predict 
phase transitions even in systems which can be shown to have none, like 
one-dimensional systems with short-range interactions). 

The theory of phase transitions has undergone important developments 
mainly for what concerns the theory of critical phenomena in the context 
of which new approximate theories have been developed which provide the 
first real novel theoretical proposals after mean field theory; they are known 
as the renormalization group approach of Fisher, Kadanoff and Wilson, 
[WF72], [Wi83]. We cannot deal with such developments in this monograph: 
the reader will find a modern acounts of them in [BG95], [Fi98]. 

Another important phenomenon of equilibrium statistical mechanics is that 
of metastability and it is still not well understood: its theory involves dealing 
with ideas and methods (and difficulties) characteristic both of the evolution 
and of the equilibrium problems. Here we shall not deal with this matter, 
see [LP79], [CCO74], [MOS90]. 

Another class of not well understood phenomena are equilibrium and 
nonequilibrium phenomena in charged particles systems: until recently it 
was even qualitatively unclear how a neutral system of charged particles 
(i.e. matter) could stay in thermodynamic equilibrium, notwithstanding the 
strong intensity and long range of the Coulomb interaction, see [Fe85] for 
a detailed analysis of the basic mechanism. Until very recently only phe- 
nomenological theories for phase transitions were available, essentially based 
on the same type of ideas at the roots of mean field theory (for instance 
Debye’s screening theory). 

Recently the problem of stability of matter (i.e. of proving a lower bound 
proportional to N on the energy of N charged particles with zero total 
charge) has been satisfactorily solved in the framework of quantum sta- 
tistical mechanics, [DL67], [LL72],[LT75], but the problem of a quantita- 
tive understanding of the thermodynamic equilibria in neutral aggregates 
of charges and of the related screening phenomena remains open, [Fe85], 
[Li81]. Of course there are (plenty) of very elaborate and detailed phe- 
nomenological theories, but here we mean that they are not fundamental 
and that, to be developed, require further assumptions (besides the funda- 
mental assumption that equilibrium states are described, say, by the canon- 
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ical ensemble) that are only justified on a heuristic basis necessary to bypass 
otherwise non-by-passable “technical” difficulties. 

For instance in the theory of molecular gases one usually postulates that 
a given system just consists of identical particles, with given time-invariant 
properties (molecules), that interact with each other via effective forces due 
to screened electromagnetic interactions. This is clearly an approximation 
(that we empirically think of as perfectly adequate) which obviously ignores 
an important part of the problem: namely that the molecules are formed 
by atoms which are formed by nuclei and electrons (forgetting protons, 
neutrons, quarks, etc), and the possibility that they dissociate, ionize or 
react chemically. Therefore one may wish to see a microscopic explanation 
of why in a range of densities and temperatures (absolutely crucial for our 
lives) matter presents itself mostly bound into complexes which have the 
size of isolated atoms and molecules: this is still not understood although 
impressive progress has been achieved in the field, [Fe85]. 

quantum statistical mechanics not only solves the conceptual problem of 
the stability of matter, [Li81], but it also introduces the possibility of a the- 
oretical understanding of a large variety of new phenomena typically related 
to the quantum nature of microscopic physics: superfluidity and supercon- 
ductivity are typical examples. So far such phenomena are understood only 
on the basis of phenomenological theories close in spirit to the mean field 
theory of phase transitions, [BCS57], [Br65] and Chap. 10,11 of [Fe72]. But 
a deeper theory has still to be developed. In fact one can say that in quan- 
tum statistical mechanics all the problems of classical statistical mechanics 
are present, usually in an unsolved form even when the corresponding classi- 
cal problems are solved, and new problems that do not even exist in classical 
mechanics become analyzable theoretically. 

It does not appear that any of the problems that are not understood are 
not understandable in the framework of statistical mechanics (classical or 
quantum as the case applies): no fundamental problem seems to have a 
theoretical description that is in conflict with experimental results. This 
paradisiac atmosphere may not last for long (its stability would be very 
surprising indeed) but as long as it lasts it gives us great peace of mind 
while still offering us a wide variety of fascinating unsolved problems. 

Finally we mention that statistical mechanics is related to many branches 
of mathematics, particularly probability and information theory that have 
received a great influx of new ideas from the theory of phase transitions and 
of ensembles, [Ru69]; and the theory of dynamical systems that has received 
influx from the theory of approach to equilibrium, [RT71], [Ru78a], [Do98|. 
Combinatorics has been greatly widened by studies of the exactly soluble 
models in statistical mechanics, [Ba82]. 

Many problems in ordinary or partial differential equations have their origin 
in statistical mechanics which has also inspired several developments in the 
theory of turbulence, [Fr97], and in the theory of quantum fields, [BG95]. 

One can say that the present state of statistical mechanics is perhaps com- 
parable to the state of mechanics at the moment of its triumphal applica- 
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tions to celestial mechanics and to ordinary mechanics at the end of the 
7700s and the beginning of the ’800s. No obvious contradiction with ex- 
periments has yet come up and nevertheless many simple and interesting 
phenomena remain to be explained by the theory. A sign of vitality is also 
to be seen in the fact that statistical mechanics continues to generate new 
and deep mathematical problems: one can perhaps say, as a nontautologi- 
cal statement, that physical theories are sources of interesting mathematical 
problems only as long as they are really alive and faced with difficulties that 
are not purely technical. 


113 


Chapter IV: 


Thermodynamic Limit and Stability 
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4.1.1 


4.1.2 
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84.1. The Meaning of the Stability Conditions 


The stability and temperedness conditions (see (2.2.17),(2.2.18)): 


Sla) = De, —4,) > -BN stability 
i<j (4.1.1) 


lla — q)| < Cla — al for la — q'| > ro temperedness 


for suitable constants C > 0,€ > 0,r0 > 0, have to be imposed on the 
interaction potential in order to insure the existence of the limits in (2.3.8), 
(2.3.9) or (2.5.12) defining fe, Sm Or Pgc respectively, i.e. the thermodynamic 
functions associated with the partition functions of various orthodic ensem- 
bles. The conditions are particularly interesting because, besides the above 
mathematical role, they have a simple, and profound, physical meaning. 
The existence of the limits is a necessary requirement to have orthodicity 
and equivalence of the thermodynamic models defined by the different en- 
sembles, as discussed in Ch.II. We have seen that if the above limits exist 
then the ensembles define the same model of thermodynamics for a given 
system. Therefore to understand the significance of the stability conditions 
it is convenient to examine their meaning in the thermodynamics model de- 
fined by one of the ensembles and we shall choose the canonical ensemble, 
where the analysis is simplest. The following analysis also illustrates some 
of the typical methods that are used in statistical mechanics. 


(a) Coalescence Catastrophe due to Short-Distance Attraction. The first 
condition in (4.1.1) can be violated in several ways. One possible way is 
when the potential y is negative at the origin: we always assume that ọ is 
a smooth function for q 4 0 and, in the case at hand, we also assume that 
y is smooth at the origin. 

Let 6 > 0 be fixed so small that the potential between two particles at 
distances < 26 is < —b < 0. Consider the canonical ensemble element with 
parameters 3, N for a system enclosed in a (cubic) box of volume V. We 
want to study the probability that all the N particles are located in a little 
sphere of radius 6 around the center of the box (or, for that matter, around 
any pre-fixed point of the box). 

The potential energy of such a configuration is ® < D(X) ~ —2N? (be- 
cause there are (Y) pairs of particles interacting with an energy < —b) 
therefore the canonical probability of the collection C of such configurations 
will be 


dpdq_—B(K(p)+®(q)) ( AT yy SÈN opbEN(N-1) 
1 collapse — ECON DEN SN _— A (4.1.2) 
pdg — $ 5 g  — 
TN NTE DC Je Feu 
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where we see that the contribution to the integral in the numerator can 
be considered as due to two factors. One is the value of the integrand 
function, whichwe call an “energy” factor, and one is the volume of the 
configuration space where the function takes the value considered, which we 
call an “entropy factor” or a “phase space factor”. The first is 8 dependent 
while the second is not. 

The phase space is extremely small because the configurations that we 
consider are very special: nevertheless such configurations are far more 
probable than the configurations which “look macroscopically correct”, 
i.e. configurations in which the particles are more or less spaced by the 


average particle distance that we expect in a macroscopically homogeneous 
1 


configuration namely (%)~1/3 = p73. 
The latter configurations will have a potential energy ®(q) of the order of 
uN for some u, so that their probability will be bounded above by 


dpda —B(K(p)+uN) VY e BUN 
Pregular = mA = NT me (4.1.3) 
paq — + q = 
TEN NTE CSG FE Je Fe) 


and we see that 
(1) the denominators in (4.1.2),(4.1.3) are (of course) equal and 


(2) the phase space factor in the numerator in (4.1.3) is much larger than 
the corresponding one in (4.1.2) (i.e. V against 6°’), at least in the “ther- 
modynamic limit” V — co, N > œ, 7 vt. 


: © Pregular 
However, no matter how small 6 is, the ratio Frames 
collapse 


will approach 0 as 
V — cw, À — v1; extremely fast because ef?N */2 eventually dominates 
over VN ~ eN logN, 

This means that it is far more probable to find the system in a microscopic 
volume of size ô rather than in a configuration in which the energy has some 
macroscopic value proportional to N: note that in a free gas, for instance 
(where ®, b = 0), the situation is the opposite, and in general, if the stability 
property in (4.1.1) holds, the above argument also does not apply. 

This catastrophe can also be called an ultraviolet catastrophe as it is due to 
the behavior of the potential at very short distances: it causes the collapse 
of the system into configurations concentrated in regions as small as we 
please (in the thermodynamic limit). 


(b) Coalescence Catastrophe due to Long-Range Attraction. This is a more 
interesting catastrophic behavior, because of its physical relevance. It occurs 
when the potential is too attractive near co. To simplify matters we suppose 
that the potential has a hard core, i.e. it is +oo for r < ro, so that the 
above discussed coalescence cannot occur and the system cannot assume 
configurations in which the density is higher than a certain quantity pep < 
oo, called the close packing density. 


4.1.4 


4.1.5 
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The catastrophe occurs if w(q) ~ —glgl 3+5, g,e > 0, for |q| large. For 
instance this is the case of matter interacting gravitationally; if k is the 
gravitational constant, m is the particles mass (assuming an identical par- 
ticles system), then g = km? and € = 2. 

In this case the probability of “regular configurations”, where particles 
are at distances of order p~'/? from their close neighbors, is compared with 
that of “catastrophic configurations”, with the particles at distances ro from 
their close neighbors to form a configuration in “close packing” (so that ro 
is equal to the hard core radius). Note that in the latter case the system 
does not fill the available volume and leaves empty a region whose volume 
is a fraction ae of V. 


cp 


A regular configuration will have a probability (in the canonical ensemble 
with parameters 3, N and if L is the diameter of V) proportional to 


Ve L L'\-3+e 
Pregular = Come h lal dq (4.1.4) 


because the energy of interaction of a single particle in a medium with 
density p is, to leading order in L — co, se gla "Eda x pgL*; in (4.1.4) C 
is a normalization constant (i.e. it is the reciprocal of the canonical partition 
function). 

Likewise if we consider a configuration in close packing and we dilate it by a 
factor (1 +ô) we obtain a configuration in which each particle can be moved 
in a small sphere of radius O(pep! °5) and we can call such configurations 
catastrophic or collapsed as they occupy only a part of the volume allowed, 
no matter how large the latter is (i.e. no matter how small p is compared 
to Pep). 

In the canonical ensemble with parameters 3, N the probability of the 
catastrophic configurations can be bounded below by 
(pep: PEN CIN Per(1+6) de al” + aq (4 1 5) 

REN N! Le 
where the constant C is the same normalization constant as in (4.1.4); and 
again we see that the catastrophic configurations, in spite of their very 
low phase space volume, have a much larger probability than the regular 
configurations, if p < pep and 6 is small enough: because the exponential 
in the energy factor, in (4.1.5), grows almost as sto > spvite/3 
provided p < pep and 6 is small enough. 

A system which is too attractive at infinity will not occupy the volume we 
give to it but will stay confined in a close packed configuration even in empty 
space. 

This is important in the theory of stars: stars cannot be expected to obey 
“regular thermodynamics” and in particular will not “evaporate” because 
their particles interact via the gravitational force at large distance. Stars 
do not occupy the whole volume given to them (i.e. the universe); they do 
not collapse to a point only because the interaction has a strongly repulsive 


Peatastrophic 2 C 
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core (even when they are burnt out and the radiation pressure is no longer 
able to keep them at a reasonable size, a reasonable size being, from an 
anthropocentric viewpoint, the size of the Sun). 


(c) Evaporation Catastrophe: this is a another infrared catastrophe, i.e. a 
catastrophe due to the long-range structure of the interactions like (b) 
above; it occurs when the potential is too repulsive at co: i.e. plq) ~ 
+gl\q\~3** as q — © so that the temperedness condition is again violated. 

Also in this case the system does not occupy the whole volume: it will 
generate a layer of particles sticking in close packed configuration to the 
walls of the container. Therefore if the density is lower than the close 
packing density, p < Pep, the system will leave a region around the center of 
the container empty; and the volume of the empty region will still be of the 
order of the total volume of the box (i.e. its diameter will be a fraction of 
the box side L with the value of the fraction not depending on L as L — ov). 

The proof of this statement is completely analogous to the one of the 
previous case, except that now the configuration with lowest energy will be 
the one sticking to the wall and close packed there, rather than the one close 
packed at the center. 

Also this catastrophe is very important as it is realized in systems of 
charged particles bearing the same charge: the charges adhere to the bound- 
ary in close packing configuration and dispose themselves so that the elec- 
trostatic potential energy is minimal. We cannot, therefore, expect that the 
charges that we deposit on a metal will occupy the whole volume: they will 
rather form a surface layer minimizing the potential energy (i.e. so that the 
Coulomb potential in the interior is constant). They do not behave thermo- 
dynamically: for instance, besides not occupying the whole volume given to 
them, they will not contribute normally to the specific heat. 


84.2. Stability Criteria 


There are simple criteria that make sure that the conditions (4.1.1) are 
satisfied. The first condition is satisfied, in general, if y > 0: one calls 
this case the repulsive potential case, although this is a somewhat improper 
definition because y > 0 does not imply that y is monotonically decreasing 
(which would in fact generate a repulsive force in the usual sense of the 
word). In this case one can take B = 0. 

Another interesting case is that of a smooth potential y which has a non- 
negative Fourier transform @. In fact in this case: 


D(q,,...,9,) = —9(O)n + i X 9( 


= —p(0)n + f ABE YD) > (0 


4.2.2 


4.2.3 


4.2.4 


4.2.5 
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because J>; ,_1 € Pa a Seve abs, if? > 0. This will be called the 


positive definite potential, or positive type, case and one can take B = y(Q). 

Of course a potential that can be expressed as a sum of a positive potential 
and of a positive type potential is also stable. A remarkable case of a 
potential that can be expressed in this way is a potential such that, for 
C,C',E,r0 > 0, it is: 


(4.2.2) 


Such a potential is sometimes called a Lennard-Jones potential although it 
is more general than the potential that was originally introduced with this 
name (see §1.2). 

The proof of the possibility of representing y as a sum of a positive potential 
and of a positive type potential can be found in [FR66]; but the stability of 
a potential satisfying (4.2.2) can be checked directly very simply, [Mo56]. 
Given, in fact, a configuration Grd, let r be the minimum distance 
between pairs of distinct points: r = mingz; lq, — gt Suppose that the pair 


of closest particles is g,,q, and that r < 410; then 


B(g,5...,9,) > OG... a) Hel) + 5 pla; = 4) (4.2.3) 


N 
lq,—4,|2r0 


where the equality would have held had we summed over all the j’s, i.e. also 
over the j’s such that |g, — q, | < ro. 


Around each of the g; we EA draw a cube Q; with side 7 and 4; being 
the vertex farthest away from VER Since any two points among Gd, 
have a distance > r the cubes ‘thus constructed do not overlap, and their 
union is contained in the complement of the sphere |g, — g| > ro — 3 2 +: 
furthermore 


geal ete —q| 6+9 (4.2.4) 


so that the sum in (4.2.3) is bounded below, for some C1 > 0, by 


-C dq > -Cı (— 


Iq\re = r 


3 pote r 
(v12) Le 0 23 (4.2.5) 


r3 


and ®(g,,...,g.) > Cay — CE)" + ®(q,,..-,g,,) so that the sum 


of the first two terms is bounded below by some —C2 > —co, provided the 
assumption that the configuration q ped, is such that r < 410 holds. 


4.2.6 


4.2.7 
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The case r > ro is easier because the density is bounded above by ~ 
8ro 3 and the interaction decreases summably at oo: a repetition of the 
above considerations yields simply that (q; ue dp) > (q,, ae dn) + 
= g(a, — g,) > D(q,,...,q,.) — C3 for a suitable C3 > 0. Hence if b = 
max(C2,C3) then ®(q.,..., —b+ ®(q,,..., > —bn and stabilit 

is ee a i di )2 o 4) j 


We conclude by noting that (4.2.2) are sufficient stability and temperedness 
conditions. But in general they are far from being necessary. 

In fact one would like to prove that certain systems not fulfilling (4.2.2) are 
nevertheless stable and have a well-defined thermodynamics, equivalently 
described by the canonical ensemble or by other ensembles. 

For instance a gas composed of electrically charged particles and with 0 
total charge is an example of a system for which we would like to prove 
stability, and even orthodicity of the classical ensembles. 

Note that it is easy to see that a gas of charged classical particles is not 
stable: just consider the configuration in which pairs of opposite charges are 
put very close to each other, while the pair centers of mass are essentially 
equispaced; this configuration has an energy which can be made < —bN for 
all b’s, by pulling the pairs close enough together. 

If, however, the particles also have a hard core interaction besides the 
Coulomb interaction, so that pairs of particles cannot be closer than some 
ro > 0, the system becomes stable as remarked by Onsager. In fact the 
potential of interaction between the charges will be the same as that which 
they would have if replaced by uniform balls of charge e; uniformly dis- 
tributed in a sphere of radius ro around the j-th particle. Then by using 
the fact that there is a hard core 


1,n 


ee; i? q, i 
ð= sat f dq dq’ — = (4.2.6) 
DR ec) 2 CAES CR ~~ g 


where oq does not vanish only in the sphere with radius around 4, and 
it is constant there and with integral 1; Here the ao domain is the 
product of two balls {q,}, {4, } of de ro centered at q, and d; Hence 


e?o ee €j0q, 


pas | dq a] pat +, (4.2.7) 
2 {a} at > awr ~ lg 


If the number of species is finite it is clear that the sum in the first line can 
be bounded below by —nb, if n is the total number of particles. The double 
sum is simply proportional to: 


>0 (4.2.8) 
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>. 


where 6(k) is the Fourier transform of >), e:aq (q), and we have used that 


the Fourier transform of the Coulomb potential is proportional to k-?. 
Hence (4.2.8) shows that a gas of a few different species of charged par- 
ticles, interacting also via a hard core potential (and possibly any further 
additional stable potential), is stable: and one should also note that stability 
does not even depend on the system being neutral. 

This example shows that stability can occur under situations more general 
than (4.2.2), and the conditions in (4.2.2) may fail and nevertheless the 
system may be stable. 

A similar remark can be made for the temperedness condition: (4.2.2) 
is sufficient for temperedness but the violations of temperedness described 
above and leading to the above infrared catastrophes may be absent in 
special systems. For instance in a gas of charged particles that is over-all 
neutral this is what really happens, see [LL72]. 


84.3. Thermodynamic Limit 


A way to check that we are not missing some other basic condition on 
the potential is to show that, if the stability and temperedness conditions 
are satisfied, the thermodynamic limits exist and the basic ensembles are 
equivalent. 

We have considered in Ch.II the existence of the thermodynamic limit of 
the entropy: 


1 dp dq 
s(v,u) = lim lo ih = 4.3.1 
(v, u) A tp LE Jacu NN! (4.3.1) 
V 


where h®% is the size of a phase space cell, see Ch.I and Ch.II. 

And, at least at a heuristic level, we have seen that the existence of the limit 
(4.3.1), the microcanonical entropy, is the key to the proof of the existence 
of the limits for fe, Pgc, see (2.3.8),(2.5.12), and for the equivalence of the 
thermodynamics models based on the classical ensembles: microcanonical, 
canonical and grand canonical. 

Therefore we shall discuss the problem of the existence of the thermody- 
namic limit in (4.3.1), i.e. the problem of the existence of the entropy in the 
microcanonical ensemble: this is in some sense harder than the problem of 
showing the existence of the corresponding limits in the canonical or grand 
canonical ensembles, but it has the advantage of implying the results for 
the other ensembles as heuristically discussed in Ch.II: the argument given 
there is easily turned into a proof, under very general conditions. 

What follows is important not only because of the results that it establishes 
but also because it illustrates some of the basic techniques used in appli- 
cations of statistical mechanics. In spite of its technical nature the reader 
could be interested in following it as it provides a good understanding of 
the physical meaning of the conditions (4.1.1) and of their relation with the 
extensivity properties of thermodynamic functions. 
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Before proceeding it is necessary to warn however that the following argu- 
ment looks at first a bit subtle, but it is in fact quite straightforward and 
the intricacy is only due to the fact that we must make sure that when 
we divide the number of particles by certain factors (often but not always 
2,4 or 8) we get an integer number. If one does not pay attention to this 
condition then the proof becomes trivial (although strictly speaking incor- 
rect). Another source of problems is that we need to say that the partition 
function corresponding to a region that is the union of two subregions is 
the product of the partition functions corresponding to the subregions, a 
property that would be obvious of the regions were separated by a corridor 
of width larger than the interaction range. But the regions that we have 
to consider touch each other and we must produce corridors by shrinking 
the regions and therefore we must compare partition functions relative to 
a region and to the smaller region obtained by cutting out of it a layer of 
width 410 around its boundary. 

The reader should, on a first reading, simply disregard the technical details 
related to the above counting and corridor problems and see that (4.3.6) 
and the consequent (4.3.8) hold at least approximately, and the existence 
of the limit then follows on the special sequence of cubes Ba with side 
2" Lo (for a fixed Lo > 0 and identifying temporarily the cubes Bn and B); 
subsequently (4.3.11), i.e. esentially still (4.3.6), implies both existence over 
arbitrary sequences of cubes and shape independence. 

One could invoke the fact that the numbers of particles are so large that a 
change of the particle number by a few units or the taking out of a layer of 
width 370 around the boundary of a region that is becoming infinite, makes 
no difference “on physical grounds”; but this is precisely the point, as we 
must show that this is correct. 

To simplify the discussion we shall suppose that the interaction has finite 
range, i.e. it vanishes for |q| > ro. 


(A) Ground State Energy Convexity as a Function of the Energy 


We first consider a special sequence of boxes and a special sequence of 
values of N,U. The boxes will be cubes BY, with side size Li, = 2” Lo — ro 
where Lo is an arbitrarily chosen unit of length; the boxes B}, are contained 
in the cube Bn with side size Ln = 2"Lo and stay away =$, at least, from 
its boundaries. The volume |B,,| is 2%" L3. 


Bh+1 


4.3.2 


4.3.3 
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Fig. 4.3.1 


The figure illustrates the boxes B,41 and B7, in the corresponding two- 
dimensional case (for simplicity); the region B,, is represented by the first 
“quarter” (i.e. the lower left square); the shaded areas represent the corri- 
dors of width ro between the boundaries of the four copies of Bj, and the 
corresponding copies of Bn. 

Given a density p > 0 we suppose first that it is dyadic, i.e. that it has the 
form p = m2735 L7? with m, s positive integers. Hence if n > s the number 
Nn = p|Bn] is an integer and we can define the “ground state energy” ep at 
density p by 
D(a» Ji An.) 


> -Bp (4.3.2) 
|Bn| 


Ep = inf 
where? the infimum is taken over all n > s and over all configurations 
Greedy, in By, and the last inequality is a consequence of the stability, 
(4.1.1). Making a difference between B/, and Bn is a convenient device that 
allows us to simplify some minor points in the forthcoming analysis and it 
should not be regarded as an issue. The quantity e, can be thought of as 
the minimum energy density of the particle configurations with numerical 
density p. 

Note that if p1,p2 are two dyadic densities then also p = (pi + p2) is 
a dyadic density. Furthermore if we consider a box Bf}, we see that it 
contains 8 = 2° boxes B’, separated by at least ro. 

If we put pi|B,,| particles in four of the boxes B/, and p2|B,| in the other 
4, it is clear, from the definitions, that the energy density of any such 
configuration is < $ep,|Bn| + $€p2|Bn| so that: 


(ep + €p2)- (4.3.3) 


In fact it is easy to see that ep TETAS because (4.3.2) holds together with 
the easily checked e, < 4p maxļg]>p-: g(a) if p is small enough.? 

Therefore if we define eg = 0 the function p — e, is continuous on the set 
of dyadic densities on which it is uniquely defined (convex functions on the 
dyadics are continuous) and it can be extended by continuity to all real p, 
dyadic or not. We shall call A the convex region above the graph of the 
extension to all p’s of p — ep. 


(B) Upper Bound on Entropy. 
Let A be the region p > 0 and e > ep; let (p,e) € A be an (interior) 
point with p dyadic and e > e,. Given p,e,V’ C V let, to simplify the 


1 The quantity ep is defined to be + if one cannot fit p|Bn| particles inside B!,; this 
may happen if the interaction contains a hard core part. 


1 
2 This is a bound of the energy of a configuration of points regularly spaced by p 3. 


4.3.4 


4.3.5 


4.3.6 


4.3.7 


124 IV. Thermodynamic Limit and Stability 


notations, No(p, e, V”) F (U, V) with Mo the microcanonical partition 
function, (2.3.5), N = pV, U = eV. In the following V’ will always be V 
deprived of a small corridor near its boundary. 
Define, for the p’s that are integer multiples of Pa a a corresponding 
microcanonical entropy: 
il ; 1 
On(P; e) =m] log No(p, €; Ba) T |Bnl se |Bn| BEN NI 


|Bn| 
4,€Bn 


(4.3.4) 


Then ©, can be bounded (because the potential ® satisfies: ® > —BN,, > 
—Bp|B;l|) by: 


3Nn 
Ge Q(3Nn)(4/2m| Brl(e — ep)) 
[Ba] h3Nn NN” e—Nn VIT Na 
(4me5/3/3) (e — ep)? 


|B, < 


On(p,e) < 
(4.3.5) 


< plog 


where Q(m) = 2/7"T(2) + is the surface of the unit sphere in m dimen- 
sional space: and no confusion should arise between the energy density e 
and the e (equal to the base of the natural logarithms) arising from using 
Stirling’s formula for N,,! and for the gamma function. The above inequal- 
ity is an essential consequence of the stability which, in this proof, is used 
here for the first time. 


(C) Quasi Convexity of the Entropy in Finite Volume. 


The function o,,(p,e) has the simple property that it is “almost” convex. 
Suppose in fact that p = $(p1+p2) and e = $(e1+e2) with p, p1, p2 multiples 
of 2-3("+1) 2-37, 9-3" | respectively, and e1 > e,,, €2 > €), (hence e > ep). 
Then we can can look at the box B/,,, and at 8 copies of By, that fit in it 
separated by a corridor of width ro. 

In four of the boxes Bi, we put Na) = p1|Bn| particles in a configuration 
with energy < e1ı|Bn| and in the other four we put N(2) = p2|Bn| particles 
in a configuration with energy < e2|Bn|; then N = 4(Na) + Ny). Then we 
shall have in B),, exactly N = p|B,,1| particles with energy < e|Bn+1| 
and clearly 


No(p, €; Bhs) 2 No(p1, €l, Bi)*No(p2, €2; Bay (4.3.6) 


where the N! in the definition of Mo, see (4.3.4), is very important because in 
deriving (4.3.6) one uses the fact that there are Nano! ways of selecting 


which among the N particles are put in each of the eight boxes. 
Relation (4.3.6) can be written in terms of the o, and it becomes: 


1 
On+1(P,€) > 5 (on(p1,e1) + On(p2,€2) : (4.3.7) 


4.3.8 
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It will be convenient to have o,(p,e) defined not only for dyadic p 
(i.e. multiples of 273” L7?) but for any real p. Since o,(p,e) is actually 
defined for p = ssr Lo 3 this can be simply achieved by defining o,,(p,e) for 
the other values of p by linear interpolation. 

The linear interpolation is very convenient and natural because it satisfies 
the bound (4.3.5), since the latter is convex in p,e; furthermore it has, by 
(4.3.7), the further property that if o,(p,e) is nondecreasing in n > no for 
each p a multiple of 273% Lg 3 then it is also nondecreasing in n > no for 
all p. 


(D) Monotonicity of Entropy as Function of the Container Size 


The property of o,(p,e) of being nondecreasing in n, for dyadic p, is the 
key property to the analysis; in fact since, as remarked above, the box By), ., 
contains eight boxes By, separated by a corridor of width ro we see (from 
(4.3.6) with p = pı = p2 and e = e1 = e2) that 


On+1(P; e)= log No(p, €, Baie 


1 
[Brr] 
(4.3.8) 


1 
log Mo(p, e, Bn) = 
Bel 


=0n (p, e) 


where again we make essential use of the N! in the definition (4.3.4) of 
No, in the same way as in the derivation of (4.3.6). By (4.3.5) o,(e, p) is 
uniformly bounded in n. 

Hence the limit as n — œ of o,(p,e) does exist for all p dyadic and for 
e > ep. But the sequence o,(p,e) will also converge for the nondyadic p’s 
because o,,(p,e) is “almost convex” in the sense of (4.3.7), and in fact the 
convergence will be uniform on every closed set in the interior of À. The 
latter property follows from elementary considerations on convex functions 
monotonically convergent to a limit. Clearly (4.3.7),(4.3.5) imply that the 
limit function s(p, e) will be convex and bounded in the interior of the region 
A. Hence it will be continuous in the same region. 


(E) Independence from the Special Sequences of Density Values and of Grow- 
ing Containers. 


We now want to show that we can free ourselves from the special sequence 
of boxes and of densities that we have considered. 

Consider a family of cubic boxes B with side L — oo containing N par- 
ticles in configurations with energy < E and suppose that x saa p and 
£ zzz e With (p, à the interior of A. Below we use equivalently the 
notation V = |B| = 
Let po = p +€ wa po be supposed dyadic, let eo = e — ņ with £, ņ small. 
We can divide the box B into boxes B, of side Ln = 2” Lo. Their number 
will be the cube of the integer part [+ ~] and they will cover a volume inside 
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4.3.10 


4.3.11 


4.3.12 
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B whose complement has size < 8L?L,. The corresponding slightly smaller 
boxes By, will therefore cover a volume 


L r L r 
Sie esil = (oie = Bis ee 4.3. 
(gp slay = IBIU- 85° -8z (4.3.9) 
If n, L are so large that 
Ln ro E 
1 — 8— — 8— — = 4.3.10 
( T JE)? > Po 37 P ( ) 


we see that by filling each box By, with Nn = po|Bn| particles we would put 
in B more than N particles (N = p| B|). 

Hence if we fill each box B’, with N, particles until ps —| Nn particles are 
located and if we put in one of the remaining boxes a Aa TE number < Nn 
of particles we shall have located exactly N particles inside B. They are 
put down in configurations with energy < eo|B,| in each box among the 
first ps =] N, and in a regular but arbitrary configuration in the last box, to 
cover as many points as necessary to reach the total correct number N (for 
instance on a square lattice with density higher than p). 

Then it is clear that if L is large enough we shall have filled the box B 
with N particles with total energy < E = e V and with overall density p; 
at the same time we shall have shown the inequality 


= log No(p,6, B)> =a (log No(po, eo, B B! yin! + Cy) (4.3.11) 


where C, bounds the log No(p’, e’) coming from the partition function of the 
particles in the last box: Cn is the maximum of log No(p’, e’) for p! = 


m 
TBa] 
for 1 < m < Nn and e’ suitably large. Once more the N! in the definition 
of the microcanonical partition function (4.3.4) is essential. 

This means, by taking into account the arbitrariness of n, that 


B def cine 1 
G(p,e)=  liminf TB 2 Mo(p,e, B) > On(Po; €0) = (Po; €0) 
Nr DS 


(4.3.12) 


N 
because EL Tos l and Ta We Nr ra Ba I" 


Then the arbitrariness of e9, po as well as the continuity of the function 
o(p,e) imply that 6(p,e) > o(p,e). 
But we can clearly repeat the argument by erchoneing the role of B and 


Bn: namely we take a very large L so that TEI log Mo (p, e, B) is very close 
to the lim sup as L — oo, x = p, 4 — eof Tr log No (p, e, B) and we take 
a very large n so that Ln >> L and show by the same type of argument that: 


o(p,e) > limsup log No(p, e, B). (4.3.13) 


a B 
N 
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This will imply 


| 1 
o(p,e)= lim FI log No(p, e, B). (4.3.14) 
NE 
LITTLE 


completing the proof of the existence of the microcanonical entropy over 
sequences of cubic boxes. 

The extension to sequences of parallelepipedal boxes with all sides tending 
to co at comparable speeds can be done along the same lines and we shall 
not discuss it. 


(F) Box-Shape Independence. 


We just mention that the argument can be perfected to show that the 
limit (4.3.14) exists and equals o(p,e) over much more general sequences of 
boxes. 

Imagine, in fact, paving space with cubes of side { > Lo (one of which 
is always centered at the origin, to fix the arbitrariness due to translation 
invariance); call B4 (£) the union of the cubes of the pavement that contain 
points of B, and B_(é£) the union of the cubes of the pavement entirely 
contained in B. Then we say that B tends to oo in the sense of Fisher if 
there exist À, a > 0 such that: 


|B+(O| —1B- (0) 
|b| 


<A. A ; (4.3.15) 


This means that the box B grows keeping the surface small compared to 
the volume (homothetic growth of a box with smooth boundary trivially 
satisfies (4.3.15) with a = 1). 

Then it can be shown that, for (p,e) in the interior of A, (4.3.14) holds on 
any sequence B — oo in the sense of Fisher. 

If the potential has a hard core the same argument as above applies except 
that the interval of variability of p is no longer [0, +00) but [0, pep), i.e. it 
is a finite interval ending at the close packing density. 

The finite range condition can also be eliminated and replaced by the 
temperedness condition (4.3.1). 

A complete treatment of all these remaining cases can be found in [Fi64], 
[Ru69]. 

The function o(p,e) is trivially related to the function s(v, u) of Ch.Il: 


Lu lu) (4.3.16) 


s(v,u) = vov” 


so that we see that s(v, u) is convex in v, at fixed u, and in u at fixed v.3 


3 In fact by taking the second v-derivative of vf(v—!) one sees that the convexity of f(p) 
implies that of vf(v—*). 
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This is also physically interesting as it shows that the derivative (22) = 
T-t} is monotonic nonincreasing in u (“positivity of the specific heat” at 


constant volume) and likewise behaves ($2) = À. 


One can also see that the latter quantities are > 0, as demanded by the 
physical interpretation of T, p as, respectively, the absolute temperature and 
pressure. In fact one can prove the relations: 


On(p,e + À) > on(p,e) + plog(1 + 
Os 
(By 


which together with the convexity properties immediately imply the posi- 
tivity of T, p. 


) 5>0 
€ — ep 


(4.3.17) 


) = OF 


(G) Continuity of the Pressure 


Consider the pressure as a function of the density at constant temperature. 
This is a function that is most conveniently studied in the grand canonical 
ensemble. Since pressure is a convex function of the chemical potential* À 
its derivative is monotonically nondecreasing (because convexity means that 
the first derivative is nondecreasing); hence its graph has, at most, countably 
many upward jumps and countably many horizontal plateaus. A vertical 
jump corresponds to a value of the chemical potential where the right and 
left derivatives of the pressure are different, while a horizontal plateau cor- 
responds to a straight segment in the graph of the pressure (drawing a few 
schematic graphs is very helpful here). 

The definition of grand canononical partition function implies, by differ- 
entiating it with respect to A, that its À-derivative is the density. Then 
to say that there are no horizontal plateaus in the graph of the density as 
a function of the chemical potential is equivalent to saying that the graph 
of the pressure as a function of the chemical potential contains no straight 
segment. 

A horizontal plateau in the graph of the density as a function of the chemi- 
cal potential means that there are several chemical potentials corresponding 
to the same density.> Hence there are several pressures corresponding to 


4 Because Bp = $ log =(8, À) and =(8, À), the grand canonical partition function, see 
Ch.II, is 2= yo. EP? 


n=0 onl 2n(8) with Z the canonical partition function for n parti- 
cles; hence in general the logarithm of a sum of quantities ce’ with c > 0 is a convex 
function of À as one checks that the second derivative of this sum with respect to À is 


nonnegative. 


Note that the derivative of the pressure with respect to the chemical potential is the 
density and, therefore, it should be strictly positive so that the pressure is strictly 
increasing with the chemical potential. But one has to show that the density is positive 
if the chemical potential is negative enough: this is not so easy and it will be shown 
in complete generality in §5.9: see the sentence preceding (5.9.17) where the relation 
p = eò (1 + O(e)) is given and interpreted as saying that the density is proportional 
to the activity z = eP at small activity i.e. at negative enough chemical potential. 


4.3.18 


4.3.19 
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the same density, i.e. a vertical discontinuity in the graph of the pressure 
as a function of the density. 

It is therefore interesting to see that such discontinuities of the pressure 
as a function of the density at constant T are in fact not possible under 
the only condition that the potential is superstable; see Ch.V, (5.3.1) for a 
definition. 

A simple proof, due to Ginibre, [Gi67], can be given for systems with hard 
core or, alternatively, with purely repulsive pair potential. The general case 
of a supertsable perenne F harder, [DM67], AOL 


Define the activity as y= icf 6-8. ST m T? h-3, where the second factor 

comes from integrating explicitly the kinetic part of the energy so that 

the grand canonical partition function = (in volume V and temperature 
= (kp)~') can eventually be written 


Co 
=D en 
i ] 


see below. Then we estimate 


-1 _ pPOBp _ p OBp 0BX _ P, OBp 1 Oz 


— B ðo BOB op B Oz zp 


at constant 8: in the grand canonical ensemble it is Bp = + log=. The 
physical interpretation of x is clearly that of isothermal compressibility. 

If we can show that y~! can be bounded away from + for z = ef^ in any 
finite interval then x is bounded away from zero in any finite interval and 
the graph of the pressure as a function of the density is continuous (in fact 
Lipshitz continuous) and as a function of the chemical potential it cannot 
contain any straight segment. 


The grand canonical density is p = “2 n) with (n pe beer n=, where 
Zn = | dr den e P 2e; POO) (4.3.18) 
with E =) o 42” Zn. Hence, if Bp = V~! log E, then: 
furthermore 
-1 _ P Bp = PP Py 
Böp B az 
Hence: dbp, à (n) 
P O6p, dp, 1 p(n 
Sg Gg CE a a R 4.3.19 
Fa D TBA a 


(n?) - (n)? 
and we need a lower bound on Eei 
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Consider systems with hard core interactions, i.e. with a pair potential 
p(x — y) equal to +00 for |x — y| < a for some a > 0, or systems with 


repulsive potential y > 0. Let y+(q) = plq) if (gq) = 0 and w+(q) LG 


otherwise. 

The key remark is that Zn42/Zn41 > Zn4i/Zn — D for some volume 
independent and continuous function D = D(B). Hence in the hard core 
case one can take D = eB [(1—e~8"+()) dx < +00 if —B is a lower bound 
for the energy of interaction ®, of one particle with any number of others.® 

Setting ®(X) the potential energy of the configuration X this fol- 
lows from the Schwarz inequality; abbreviating (x,,...,2,) “fy and 


(X, x) F De 1 px; — z) > —B, we find: 


Z+ = a < 


< (jar): ( | ax dedye” X)—61(X,2)— PRIM) = 


dass = Zn | dXdadye aan (et Hew) —1+1)= (4.3.20) 


= ZnZn42 + Zn J dX dxe~P?* 2) I dye PPX (1 — e-PrG) < 


< ZnZn+2 + Zn | dXdge®™ NE) Ju 1 — e+e- ¥)) = 


Zn- H2 > Zn+1 


n+1 n 


— D 


= ZnAn+2 i ZnZn+1D, > 


where D = efB f(1 — e-fe+ dy. Therefore, using again the Schwartz 
Py Ae ) 


ne = 


inequality (plus the normalization property ÿ;, 


4.3.21 ee) 


so that developing the square the r.h.s. becomes 


72 
Fe LOY Das +2? n? D? Zn) 


n 


6 Therefore B < +0 if there is a hard core or if y > 0; while if the potential is of 
Lennard-Jones type, or more generally a superstable potential, with an attractive part 
there is no such B and the theory is more difficult, see [Ru70]. 
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and using the last of (4.3.20) this is bounded above, if p = (z”"/n!) (Zn /2), 
by 


OO n 1 

D (Zagat 22DZ a4 +22 DnZny1 +n? D Zn) = 
n' = 

n=0 


— 5 ((n + 1)(n + 2)pn42 + 2D (n + 1)Pn+1 


n 


+2 


=0 
zDn(n + 1)pngi + 27D? Dn) = 
n’) — (n) + zD(n) + 2zD((n?) — (n)) + 22D? (n?) = 
= (1 + 2D)?” (n°) — (1+ 2D){n) 
hence one finds 3 
(n2)—(n)? 1 
(n) — 1+2D 


which proves a lower bound on yx and, therefore, the continuity of the pres- 
sure as à function of the density at constant temperature. 


4.3.22 (4.3.22) 
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85.1. Virial Theorem, Virial Series and van der Waals Equation 


Van der Waals theory is one of the earliest and simplest applications of 
classical statistical mechanics (1873: see [VW88]). Nevertheless it brings 
up one more of its conceptual problems, although not as deep as the critical 
problems of Chap.III. It clearly indicates that one has to give up the naive 
hope that the theory of phase transitions and phase coexistence could be 
easily quantitatively accessible. 

The classical approach starts from the virial theorem (Clausius). Consider a 
real gas with N identical particles with mass m in a spherical (for simplicity) 
container with volume V; suppose that the microscopic interaction potential 
between two particles at distance r is a Lennard-Jones potential: 


g(r) = 4e((ro/r)"? — (ro/r)°) (5.1.1) 


where € is the interaction strength and ro is the diameter of the molecules. 
Let the force acting on the i-th particle be Ls multiplying both sides of 


the equations of motion mq, = Í, by —34, we find 


z ie def 1 
—— ma, a = =5 oat, = zD (5.1.2) 


and the quantity C (q) is the virial of the forces in the configuration q; note 
that C(q) is not translation invariant because of the presence of the forces 
due to the walls: writing the force f ; as à sum of the internal forces and of 
the external forces, due to the walls, the virial C can be expressed naturally 
as sum of the virial C;» of the internal forces (translation invariant) and 
of the virial Cert of the external forces. By dividing both sides by 7 and 
integrating over the time interval [0,7] one finds, in the limit rT — +o, 


(T) = 5 (0) (5.1.3) 


which is read by saying that the average kinetic energy equals half the 
average virial of the forces. 

The virial naturally splits as the sum of the virial due to the internal forces 
Cint and that due to the external ones Cert. The virial of the external forces 
is simply 

(Cent) = 3pV (5.1.4) 


where p is the pressure and V the volume. Equations (5.1.3) and (5.1.4) 
constitute the virial theorem of Clausius. 

A quick proof of (5.1.4) is that the external forces act only on the boundary 
of the (spherical) box B containing the system: they send back into the 


5.1.5 
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container particles that try to get out. The average force that a surface 
element do of the walls exercises over each colliding particle is —pn, do if p 
is the pressure and n, the outer normal. Hence (Cest) = p fap docé-n.(£) = 
p3V by Green’s volume formula. A more refined argument leading to the 
same result is possible. 

Since the average kinetic energy is 3 BIN we see that the equation of state 


is: (Con) 
1 int 
BP * SN 
if 867! = kpT and T is the absolute temperature and v the specific volume. 
The two relations (5.1.3) and (5.1.4) together with their corollary (5.1.5) 
constitute Clausius’ virial theorem. The equation (5.1.5) is essentially the 
equation of state. In the case of no internal forces it yields Gpv = 1, the 
ideal gas equation. 
Van der Waals first used the virial theorem to perform an actual compu- 
tation of the corrections. Note that the internal virial C;,+ can be written, 


tie op) 


(5.1.5) 


N 
Cint = — > Doka AA 3q Pla, g q;) ' (a; = q;) (5.1.6) 


i=1 iFj i<j 


which shows that the contribution to the virial by the internal repulsive 
forces is negative while that of the attractive forces is positive. To evaluate 
the average of (5.1.6) we simply use the theory of the ensembles and choose 
to use the canonical ensemble, as it is more convenient. 


1 The force due the spherical container boundary can be represented as: 


iz) = - f dog n° (E)F(@ — €) (a) 
OB 


where F is a nonnegative scalar, n°(£) is the outer normal to the boundary OB of B at 
€. The function F is not zero and very intense only in a very tiny region near the origin, 
so that (x) is not zero only very close to the boundary. We are really interested in the 
limiting case in which the force F is a Dirac 6-function, which represents the ideal case 
of a perfect wall with no width. 

The virial of the external forces 6 necessary to confine the system inside the box B is 


=D z ét) = | den (€) -E (D Fle; -§) (0) 
aB : 


where (-) denotes the time avergae, and having replaced n°(£)-x, by n°(£)-£ because of 
the locality property of F (exact if 6 is a delta function). The average OS) F(z; — §)) 
is € independent because of the assumed spherical symmetry and it represents the force 
exerted per unit surface area near &, i.e. it is the pressure p so that the average virial is 


p fos doeng -€ = 3pV. For an extension to nonspherical containers see [MP72]. 
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One could proceed by using time averages and the ergodic hypothesis, 
i.e. the microcanonical ensemble, but the result would be the same. One 
could also proceed by simply taking the few time averages that we really 
need and argue that their value should coincide with the one we calculate as- 
suming the gas essentially free and adding corrections that take into account 
that close particles interact. This would be far weaker than the ergodic hy- 
pothesis and it was the path followed by van der Waals; however, unlike the 
equidistribution assumption, it does not lead easily to a systematic power 
series expansion in v~! of the corrections. 

In the canonical ensemble the average internal virial is, taking into account 
the symmetry in g, and denoting ®,, (ds dy) = ae p(a, — CPE 


(one @ J dq, dq, - - Ba Gyr Gy) BE rx). 
N! (5.1.7) 


which can be rewritten (using an integration by parts) as 


(se ee / dq, dq, .. Ba (Gyr Gy) BPI), 


2 N! 
Fb, CE. eee I th 5 (5.1.8) 
D CE EE A dy )—-BP Gy) 
ap am ae (aura, a c ACE R 
2 (N —3)! 


A (eP?) = DICA == d) A a, P == 42) 

where the er (4,42) has been replaced, before integrating by parts, by 
On, (ef Pa, =) _ 1) to avoid boundary contributions in the integrations (in 
fact e PF) is 1 at q = œ when, eventually, we take the limit as V — co). 
To rewrite (5.1.8) in a better form it is useful to introduce the notion 


of correlation function: the k-points correlation function p(g,,...,q,) is 
dq... 

defined so that Pq: ti 54) d- 4: is the probability of finding k particles 

in the infinetisimal volume elements dq,» da dq, Hence, in the canonical 


ensemble: 


o dx ... dx = = aai Égre 
pag) = Set [EE Le PP Ene) (5.1.9) 


where the normalization Zy is the canonical partition function. Note that 
Cre ee dy) is not normalized to 1; in fact fy Pld» i .9,)dq, dq, = 
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N(N —1)...(N —k +1). It is a simple matter of algebra to check that 
3 = = z 
(Cint) = Š f 4,24, (¢ Be — 49) _ 1) 6% 4.) p(q,,4,)+ 


g 75 | “dada, (e Pema) — 1) (p): (5.1.10) 


cP P44) 0g, , 4554s) i 


D PCl — 43) ° (4, — Gy) 
It can be shown, see 85.9 below, that the correlation functions of order k 
are, for p small, analytic functions of p divisible by p? and proportional to 
e PP). in fact 


Pdo) = pë @ PPG.) (1 + pFi(q) + F(a) +...) (5.1.11) 


so that (5.1.10) can be used as a starting point for a systematic expansion of 
the equation of state in powers of p. We do not discuss here the possibility 
of the expansion (5.1.11), not because it is difficult, but because it would 
lead us into a technical question that certainly was not worrying people at 
the time the above analysis was performed; we defer it to §5.9. 

The physical meaning of the correlation functions of order k shows that 
they should be proportional to p* and their definition (5.1.9) shows that 
they ought also to be proportional to e72? 4), Hence it is quite clear 
that unless some integrals diverge, (5.1.10) already allows us to evaluate the 
first correction to the gas law. We simply neglect the third order term in 
the density and use p(q,,9,) = peP?) in the second order term. 

But there is no apparent reason for the integrals to diverge: they contain 
the factors (er -2) — 1) and 04, (4, — 43) which tend to zero at large 
arguments so that the divergence sources should be quite subtle. About 
hundred years after the original work of van der Waals the actual conver- 
gence of the series in (5.1.11) and of the virial series has been mathematically 
proved. We shall discuss it from a modern viewpoint in the following 85.9. 


Then i 3 
5 (Cint) = Vag? 118) + VO(p*) (5.1.12) 


where 1(8) = $ (e~ Beta) —1) d?q and the equation of state (5.1.5) becomes 
pu + KO + O(v-) =p}, 

The calculation of J can be performed approximately if Be < 1 (i.e. at 
“high temperature”), by imagining that y(r) = +00 (i.e. e7820) —1 = —1) 
for r < ro and e~8"(") — 1 = —By(r) for r > ro. One has: 


1 ro 5 B CO 9 
Ix- —4rr dr — — p(r)Arr dr = 
2 0 2 ro 


32 5.1.13 
= —4v + F Bev = Gi 


= —(b — Ba) 
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with i y 
vo = ree b=4u, a= Sev. (5.1.14) 
Then it follows that pu + € — 5 = 3 so that 
a b 1 1 1 1 
—)y=(14+-)-= = — .1.15 
P+ Bea (+ g rg ga 6115) 


or (p+ &)(v — b)8 = 1 + O(v~’), which gives the equation of state up to 
O(v~?) and for Be <1, i.e. at high temperature and low density. 

It is in fact possible to compute, or at least to give integral representations 
of the coefficients of arbitrary order of the virial series: 


Bp =v +X (Bw? (5.1.16) 
p=2 


and one can even show that the series converges for @ small and v large 
(i.e. high temperature and small density) if the stability and temperedness 
conditions discussed in Chap.II, Chap.IV, (4.1.1) hold. 

Equation (5.1.15) can be compared with a well-known empirical equation 
of state, the Van der Waals equation: 


B(p +a/v?)(v—b)=1 or (p+ An?/V?)(V—nB)=nRT (5.1.17) 
where, denoting Avogadro’s number by Na, 
A=aNi, B=bN4, R=kpNa, n=N/Na. (5.1.18) 


It is clear that (5.1.16) and (5.1.17) coincide up to quantities of O(v~+) 
hence (once an explicit form for y like (5.1.1) has been assumed as a good 
description of the system) (5.1.18), (5.1.14) show us how it is possible to 
access the microscopic parameters € and ro of the potential y via measure- 
ments detecting deviations from the Boyle-Mariotte law Bpv = 1 of the 
rarefied gases: 


e = 3a/8b=3A/8BN4, ro = (3b/2r)! = (3B/2rN4)!3. (5.1.19) 


Equation (5.1.17) is, however, empirically used beyond its validity region 
(very large v, i.e. very small density) by regarding A, B as phenomenolog- 
ical parameters to be experimentally determined by measuring them near 
generic values of p, V, T. The result is that the values of A, B do not “usually 
vary too much” and, apart from this small variability of A, B as functions of 
v, T, the predictions of (5.1.17) have been in reasonable agreement with ex- 
perience until, as the precision of the experiments increased over the years, 
serious inadequacies eventually emerged. 


5.1.20 


5.1.21 


140 V. Phase Transitions 


A striking prediction of (5.1.17), taken literally, is that the gas undergoes 
a “gas-liquid” phase transition with a critical point at a temperature Te, 
volume ve and pressure pe that can be computed via (5.1.17) and are given 
by (see 81.2, table (1.1)) 


RT, = 8A/27B, V; = 3B (n=1). (5.1.20) 


The critical temperature is defined as the largest value Te of the temperature 
for which the graph of p as a function of v is not monotonic decreasing; the 
critical volume V, is the value of v at the horizontal inflection point occurring 
for T = T. 

At the same time this is very interesting as it shows that there are sim- 
ple relations among the critical parameters and the microscopic interaction 
constants (€ ~ kgT, and ro © (Va /N4)) t3: 


€ = 81k8T./64, ro = (Ve /2r N4)! (5.1.21) 


if the model (5.1.1) is used for the interaction potential 4, see the table in 
81.2. 

On the other hand, (5.1.17) cannot be accepted acritically not only be- 
cause in its derivation we made various approximations (essentially neglect- 
ing O(v~') in the equation of state), but mainly because for T < Te the 
function p is no longer monotonic in v, and the latter is a thermodynamic 
function that in Chap.IV and Chap.II has been shown to be monotonic non- 
increasing as a consequence of the very general convexity of the free energy, 
evaluated for instance in the canonical ensemble f.(8,v), as a function of 
v, i.e. ©? f./Ov? > 0 so that —Op/dv = 8? f./Ov? > 0. 

If, nevertheless, the isotherms of (5.1.17) are taken seriously even for T < 
Te, by interpreting them as describing metastable states, then the “correct” 
equation of state can be obtained by noting that p as a function of v has 
a horizontal plateau [v,v,] in the situations in the Fig. 5.1.1. Here the 
plateau associated with the represented isotherm is drawn; hence the density 
undergoes a jump from vw to vg as the pressure decreases and w,v, are 
interpretable as the specific volumes of the liquid and of the gas. 


5.1.22 
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The horizontal plateau must be drawn so that the areas y,ô are equal. 
The reason is that the reversible thermodynamic cycle obtained by having 
the system go through a sequence of transformations along the plateau and 
back along the curved parts of the isotherm would yield an output of work 
represented by the difference between the areas (if run in an appropriate 
direction). However it would be a Carnot cycle at constant temperature 
which, by the second principle of Thermodynamics, should instead yield 0 
work. 

This is the well-known Maxwell construction that, as we see, is motivated 
in a rather obscure way because it is not clear whether is is really possible to 
perform the above Carnot cycle since it is at least doubtful, [LR69], that the 
intermediate states with p increasing with v could be realized experimentally 
or even be theoretically possible (see, however, the theorem in §5.2). 

The van der Waals equation, refined and complemented by Maxwell’s rule, 
nevertheless provides a simple picture for the understanding of the liquid-gas 
transition in statistical mechanics. But it predicts the following behavior: 


(p—pe)x(V—-V,)? 6=3,T=T, 


(5.1.22) 
(vg — v1) (TT) B= 1/2, for T => T7 

which are in sharp contrast with the experimental data gathered in the 
twentieth century. For the simplest substances one finds instead 6 S 5, 8 S 
1/3. 

An accurate measurement of ô and f is very delicate and this explains why, 
for a long time, the equation of van der Waals has been considered a “good 
representation” not only for a high-temperature low-density gas regime but 
for the liquid-gas transition regime as well. To gain an idea of the orders of 
magnitude of the constants A, B, hence of the microscopic interaction data, 
see the table at the end of §1.2. 

One should stress that the disagreement between theory and experiment 
that we are discussing has a rather different meaning and implications if 
compared with the discussions in Chap.III. The disagreement here is due to 
bad approximations (such as having neglected higher-order corrections in 
v`! in (5.1.15) or such as having assumed that the virial series converged 
even for values of v, T close to the critical point). 

Here the disagreement does not involve fundamental questions on the foun- 
dations of the theory: it only involves the analysis of whether a certain 
approximation is reasonable or correct, or not. 

One should remark, last but not least, that the blind faith in the equation of 
state (5.1.17) is untenable also because of another simple remark: nothing in 
the above analysis would change if the space dimension was d = 2 or d= 1: 
but in the last case, d = 1, one can easily prove that the system, if the 
interaction decays rapidly at infinity, does not undergo phase transitions, 
a fact usually known as Landau’s argument, see §152 in [LL67], and which 
can be made into a mathematical theorem proved, as such, by van Hove, 
[VH50], see §5.8 below. 


5.2.1 


5.2.2 
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In fact it is now understood that the Van der Waals equation represents 
rigorously only a limiting situation, in which the particles have a hard core 
interaction (or a strongly repulsive one at close distance) and a further 
smooth long-range interaction y: very small but with very long range. This 
is discussed in §5.2. 

As a final comment it is worth stressing that the virial theorem gives in 
principle the corrections to the equation of state in a rather direct and 
simple form as time averages of the virial of the internal forces. Since the 
virial of the internal forces is easy to compute if one knows the positions 
of the particles as a function of time we see that the theorem provides a 
method for computing the equation of state in numerical simulations. In 
fact this idea has been exploited in many numerical experiments, in which 
the (5.1.5) plays a key role. 


85.2. The Modern Interpretation of van der Waals’ Approxima- 
tion 


Suppose that the system has an interaction potential y(r) = @ne(r/ro) + 
700(yr/ro) where Ync(r/ro) vanishes for r > ro and is +0 for r < ro (hard 
core potential), while yo is a smooth function with short range (i.e. either 
eventually equal to 0 as r — ov, or tending to 0 exponentially fast as 
r — oo, say). Here y is a dimensionless parameter which is really used to 
set a variable value of the range to + tro. 

In other words we assume that, apart from the hard core, the particles 
interact via a potential which is very long range as y — 0 but, at the same 
time, it becomes very weak. 

When y is very small and the density of the system is fixed to be p we see 
that the energy of interaction between one particle and the remaining ones 
will be essentially entirely due to the particles that are very far apart: the 
close ones, being (relatively) few, will therefore contribute a small amount to 
the energy, because the strength of the potential is very weak, proportional 
to 7°. 

The energy of a particle in the force field of the others is in fact, if Gp is 
the (y-independent) integral of y?yo(yr/ro) 


t r def __ 

p | s*eoly—)ar = p fear = PPo (5.2.1) 
To To 

so that the energy of a configuration in which the hard cores of the particles 

do not overlap will be essentially given, at least for small y, by 


1 
U = NP Po- (5.2.2) 
The quantity p Po is sometimes called the mean field at density p. 
The last relation allows us to compute immediately the canonical partition 
function. Let V — Nvo be the volume available to each particle, i.e. the 


5.2.3 


5.2.4 


5.2.5 


5.2.6 
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total volume minus the volume occupied by the impenetrable hard cores of 
the particles: vo is of the order rë and it will be taken, to be in agreement 
with (5.1.14), to be 
4 
b = Avy = OCDE (5.2.3) 
Then if the energy of a configuration is well approximated by (5.2.2), the 
canonical partition function is approximately: 


2 = 3N 3N 
Z(B,v) Gon EE z 
(5.2.4) 


FEN (V — NE)" 


E ( =) 25 eB 
ANT N! 


where a somewhat uncontrolled approximation is made about the q integra- 
tions, as clearly the integral over the configurations of N particles that are 
constrained to stay at a distance ro from each other is a highly non trivial 
quantity: only naively can one hope to approximate it by (V — Nb)", even 
if we allowed simple adjustements of the value of the empirical “excluded 
volume” b to improve the approximation. 

If one accepts (5.2.4) then the equation of state can be computed straight- 
forwardly. In fact the free energy fe, see (2.3.8), is given by 


-6 fe(8,v) = lim ~log Z(B,») = 


— (5.2.5) 
= log(\/2amB-th-2)? — pu + log(v — bye 
leading, by differentiation, to 
= Of. = BPo 1 
Bp(6,v) = —( dv 8 = Dy2 v —b (5.2.6) 


which coincides with (5.1.17), thus providing an alternative interpretation 
of the van der Waals equation and motivating the qualification, which is 
usually given to it, of mean field theory. 

The above discussion shows that the van der Waals equation can be exact 
only if the interaction has extremely long range and at the same time just 
weak enough to have a nonzero integral Po so that it is correspondingly 
so small that any individual particle contribution to the potential energy 
of a fixed particle is negligible apart, of course, from the hard core energy 
which, unsatisfactorily, is taken into account by replacing the integral over 
the configurations of non overlapping cores by (V — Nb)". 

In fact the latter approximation can be eliminated by replacing (5.2.4) by 


5.2.7 


5.2.8 


5.2.9 
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the more accurate: 


Bi 4070) ES 
Z(B,v) = if eA En tb FN) dg = 
27m\3IN ig > 
=( aD e 29 PPN Za(v), (5.2.7) 
dN 


where the configuration integral Zo(v) (which is G-independent) over the 
nonoverlapping hard core configurations {g,} in the volume V is not com- 
puted. 

Equations (5.2.7) then imply that the free energy and the equation of state 
of our gas are: 


—Bf(B,v) =—Bfo(v)— B52, — (p(6,v)— £3) 8 = Polv) (5.2.8) 


where 
Pw) = — 50 N LE 20) (5.2.9) 


is the (temperature-independent) product of @ times the pressure po of 
the hard core gas. It replaces its crude approximation Po(v) = Bpo(v) = 

(v — vo)! in (5.2.6). 

It is not difficult to see that the G-independence of Po(v) implies that, if 
Po < 0, i.e. if the potential has a long range attractive (i.e. negative) tail, 
then (5.2.8) will have, at low temperatures, a graph which is qualitatively 
similar to that of (5.1.17) with a > 0 (hence like Fig. 5.1.1). 

Thus the equation of state (5.2.8) will show phase transitions, and also the 
phenomena of negative compressibility and metastability. 

The negative compressibility can be eliminated by Maxwell’s rule. But one 
is still left with the unpleasant feeling that somehow one is doing something 
wrong. This is clearly signaled by the fact that in spite of the improvements 
in the approximations we are still getting a pressure that is a nonmonotonic 
function of the specific volume (if 8 is large enough, i.e. if the temperature 
is low). 

At least in one-dimensional gases the excluded volume problem is trivial 
and one can simply check that Bpo(v) = Po(v) is indeed = with vp = ro, 
and, therefore, this is clearly a contradiction because we are getting a non 
monotonic pressure in a situation in which the theory of §4.3 does apply, 
and implies convexity of the free energy, i.e. monotonicity of the pressure. 

Continuing to denote by Po(v) the temperature-independent product 
Bpo(B, v) of 8 times the pressure of the pure hard core gas, the following 
result sheds a great amount of light on the intricacy of the above situation, 
showing that the presence of a negative compressibility region is an artifact 
of the mean field approximation: 


5.2.10 


5.2.11 
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Theorem: Suppose that we fix y > 0 and we call p(B,v;7) the canonical 
pressure (in the thermodynamic limit) for the gas interacting with the po- 
tential A A 
plr) = Pne(—) +207) (5.2.10) 
ro ro 
where Phe = 0 forr > ro, Phe = © forr < ro; Yo is a smooth potential 
rapidly decreasing at co and with integral Po < 0. Let also Po(v) be the (B- 
independent) product of B times the pressure of the pure hard core gas (in 
the thermodynamic limit), i.e. the pressure of the gas in which the particles 
interact only via the hard core potential yo. Then: 


B Po 
P 5.2.11 
2v2 T o(v) Maxwell rule ) 


8 plp v) lim 8 (3,057) = | 
7-0 


where the subscript “Maxwell rule” means that in the regions where the 
right-hand side is not monotonic in v (existent if Pa < 0 and B is large) 
the pressure p(3,v) is obtained with the help of the Maxwell construction 
discussed in §5.1. 


Thus we see that there is indeed a firm foundation to Maxwell’s rule which 
does not rest on dubious Carnot cycles: the van der Waals equation becomes 
rigorously valid in the limit in which the attractive tail of the potential 
becomes very weak but with so long-range that the mean potential (“mean 
field”), see (5.2.2), that it generates in a point has a fixed value. 

If the dimension is d = 1 the hard core gas pressure Po(v) is rigorously 
Bpo = (v — vo) ! and the equation of state becomes exactly the Van der 
Waals equation. In higher dimensions Bpo = (v — vo) ! is only an approxi- 
mation (no matter how vo is chosen), but the basic fact that the equation of 
state is a trivial modification of a reference, “simpler” (so to speak), system 
(the hard core gas) together with Maxwell’s rule remains valid. 

One can also say that the van der Waals equation arises when one inter- 
changes two limits: the thermodynamic limit and the limit of infinite range 
y — 0. It is obvious that if instead of taking the limit V — oo first and 
then the limit y — 0 one did the opposite then the equation of state would 
have been p = po and the attractive tail would have given no contribution. 

The potentials like (5.2.10) are called Kac’s potentials, [HKU63] and one 
can say that the above theorem plays a role analogous to that of Lanford’s 
theorem for the Boltzmann equation, see §1.8 and §1.9: in both cases a 
statement that has approximate validity becomes exact in a suitable limit. 
And in both cases the statement seems incompatible with obvious properties 
of the system (reversibility in the first case and strict convexity in finite 
volume systems of the free energy in the second), although of course the 
first case concerns a far more fundamental problem than the second. 

But both cases are instances of a method of analysis that has been devel- 
oped very much in the twentieth century, in which one tries to understand 
some properties that cannot be exactly true in normal situations but that 
become exactly true in suitable limiting situations thus leading to a more 
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or less detailed understanding of why they may look true even when the 
limit is not taken. Of course a complete theory should also come together 
with estimates (and possibly reasonable ones) of how far we are, in concrete 
situations, from the limiting cases (i.e. how big are the corrections on the 
quantities on which we might be interested). 

The method is a modern interpretation of the basic conception of Boltz- 
mann on the relation between the apparent continuum of reality, as we 
perceive it and input it in most of our models or theories for its interpreta- 
tion and understanding, and the possibly intrinsic and deep discrete nature 
of reality and of our own thinking. 

This is exemplified by the quotation in 81.1 and in many others among 
Boltzmann’s writings, for instance: 


“The concepts of differential and integral calculus separated from any atom- 
istic idea are truly metaphysical, if by this we mean, following an appropriate 
definition of Mach, that we have forgotten how we acquired them”, p. 56 in 
[Bo74]. 


And I cannot resist the temptation of more quotations, as this is really 
music for the mind: 


“Through the symbols manipulations of integral calculus, which have become 
common practice, one can temporarily forget the need to start from a finite 
number of elements, that is at the basis of the creation of the concept, but 
one cannot avoid it’; p. 55 in [Bo74]. 


or: 


“Differential equations require, just as atomism does, an initial idea of a 
large finite number of numerical values and points ...... Only afterwards 
it is maintained that the picture never represents phenomena exactly but 
merely approximates them more and more the greater the number of these 
points and the smaller the distance between them. Yet here again it seems 
to me that so far we cannot exclude the possibility that for a certain very 
large number of points the picture will best represent phenomena and that 
for greater numbers it will become again less accurate, so that atoms do exist 
in large but finite number, see p. 227 in [Bo74]; 


and: 


“This naturally does not exclude that, after we got used once and for all to 
the abstraction of the volume elements and of the other symbols [of calculus] 
and once one has studied the way to operate with them, it could look handy 
and luring, in deriving certain formulae that Volkmann calls formulae for 
the coarse phenomena, to forget completely the atomistic significance of such 
abstractions. They provide a general model for all cases in which one can 
think to deal with 101 or 1010" elements in a cubic millimeter or even with 
billions of times more; hence they are particularly invaluable in the frame 
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of Geometry, which must equally well adapt to deal with the most diverse 
physical cases in which the number of the elements can be widely different. 
Often in the use of all such models, created in this way, it is necessary to 
put aside the basic concept, from which they have overgrown, and perhaps to 
forget it entirely, at least temporarily. But I think that it would be a mistake 
to think that one could become free of it entirely.” ? 


The latter sentence, p. 55 in [Bo74], reminds us that the evaluation of the 
corrections is of course a harder problem, which it would be a mistake to set 
aside, even in the above case of mean field theory. In fact the corrections 
are quite important and somehow even more important than the mean field 
theory itself, which will remain as a poor idealization of far more interesting 
cooperative phenomena. 

It should also be noted that the above analysis does not allow us to solve 
a fundamental question: can classical statistical mechanics predict and de- 
scribe phase transitions? We have seen that the van der Waals theory is 
no proof that when no infinite range mean field limit is taken (i.e. in most 
interesting cases) then a system can show phase transitions. 

This clearly emerges from the remark that if y is not 0 then in d = 1 one can 
prove, as a theorem (see above, and §5.8), that the system cannot undergo 
any phase transition whatsoever (and the pressure is strictly monotonic in 
the specific volume, “no plateau” at all); and nevertheless the above theorem 
also holds in d = 1, where in fact it leads precisely to the Van der Waals 
equation (with the volume of the box being replaced by the length of the 
box and similar obvious changes). 

It is therefore important to see whether genuinely short-range models (no 
y around) generate equations of state with phase transitions. This will be 
thoroughly discussed in Chap.VI in simplified models, because in the cases 
in which one would like to have results the problem is still open, and we shall 
see that in the simplified models phase transitions are possible even when 
the interactions have short range and the analysis will leave little doubt (in 
fact no doubt at all) that phase transitions are possible in classical statistical 
mechanics, without the necessity of introducing any new assumptions or new 
physical laws. 

We should however mention that important breakthroughs seem to be un- 
der way: see [Jo95] and [LMP98]. 

The standard approach to the van der Waals theory (also called mean field 
theory) can be found in [CC53], p. 284. A more refined and interesting 
formulation is in [VK64]. A precise and very clear theory is in [LP66]. The 
first precise understanding (and full proof in particular cases) of mean field 
theory is in [HKU63], in a series of papers reproduced, with introductory 
remarks, in [LM66]. A more phenomenological but very interesting and 
original theory is in the book [Br65], where the most common phase tran- 
sitions are treated from the unifying point of view of the mean field theory. 
The original work of van der Waals has been reprinted, [VW88]. 


2 Lucretius would not have said it better. 
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$5.3. Why a Thermodynamic Formalism? 


In the next sections we devote ourselves to a more detailed analysis of the 
framework in which phase transitions could be placed and of the techniques 
that one may envisage to apply towards an understanding of their proper- 
ties. It will be a rather abstract analysis, usually called “thermodynamic 
formalism” that plays in statistical mechanics a role akin to that played by 
the Hamiltonian formalism in mechanics. 

One does not have to recall that the formalism of Hamilton, in itself, does 
not make mechanics problems any easier than other formalisms. However 
it has become a tautology that it is a very appropriate formalism to de- 
scribe mechanical phenomena. The same can, or should, be said about the 
thermodynamic formalism. 

The theory of orthodic ensembles provides us with a model of Thermody- 
namics but, strictly speaking, only in the limit of infinite volume. In this 
situation one also obtains equivalence between the various ensembles, see 
Chap.Il. 

The elements of the orthodic ensembles describe in great detail the struc- 
ture of the thermodynamic phases (i.e. macroscopic states), well beyond the 
simple microscopic definition of the classical thermodynamic quantities, and 
even provide us with the (surprising) possibility of computing theoretically 
some relations between them (e.g. the equation of state). Every element 
of a statistical ensemble describes details of the microscopic configurations 
that are typical of the corresponding phase, because it gives the probability 
of each individual microscopic configuration. 

The problem of the “thermodynamic limit” theory is that of establishing 
a formalism in which it becomes possible to make precise and sharp various 
statements that we have made so far, on intuitive or heuristic grounds, and 
thus lay the grounds for a deeper analysis and for deeper physical questions. 

We shall only consider the case of classical statistical mechanics, in which 
one neglects the size of Planck’s constant h. 

What follows, as stated at the end of the previous section, is a formalism: 
as with all formalisms it has interest only because it provides a natural 
frame (as experience taught us) in which the discussion of the most impor- 
tant questions and applications can be situated. This is not the place to 
argue that this is the best formalism: others are possible and in the end 
equivalent. But we need a formalism just in order to formulate precise ques- 
tions, suitable of being given quantitative answers. The amount of work to 
be done will be independent of the formalism used (of course). 

It is well known that for each class of problems the formalism in which they 
are formulated often has a clarifying and unifying role: the emergence of a 
“good” formalism is often successive to the solution of important problems 
in the field. This seems to be the case of the thermodynamic formalism and 
the following few sections should be understood from this viewpoint. 

As an example of the problems that it would be premature to formulate 
without a clear formalism in which they fit one can quote: 


5.3.1 


5.3.2 
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(1) describing the spatial correlations between particles in a gas, 


(2) describing (and in fact defining) the surfaces of separation between 
different, but coexisting, phases, 


(3) understanding the formation and dissociation of gas molecules or atoms 
into their more primitive constituents in stationary state situations, and 
other cooperative phenomena. 


A initial question is in which sense an element of a statistical ensemble 
describes a probability distribution on phase space once the limit of infinite 
volume has been taken. We shall consider here only the grand canonical 
ensemble representation of the equilibrium states, because it is somewhat 
easier to discuss than the canonical or microcanonical. We examine the case 
of a system of identical particles with mass m enclosed in a cubic box V. 

The particles interaction will be assumed to take place via a potential y 
satisfying at least the stability and temperedness conditions (4.1.1), that 
are necessary according to the analysis of Chap.IV in the theory of the 
ensembles: i.e. ®(q,,...,¢,) = Lic; 9(G,; — q,) > —Bn (“stability”) and 
lo(r)| < Clr|-S+®) for |r| > ro > 0 (“temperedness”), B,C,e > 0. 

To avoid several technical problems it will also be convenient to suppose 
that the potential y has a hard core with diameter ro, i.e. it is defined as the 
sum of a smooth potential plus a singular potential which is +oo for |r| < ro 
so that y(r) = +o for |r| < ro. This has the physical significance that two 
particles cannot be closer than ro, but a large part of what will be discussed 
does apply, with suitable modifications (and several open problems left), to 
the case of a superstable potential. This is a potential such that there are 
two constants A, B > 0 such that 


OG erty) Bat An?/V if qg, EV (5.3.1) 


where V is an arbitrary cubic volume containing an arbitrary number n > 2 
of particles located at q,,...,q,. The Lennard-Jones potential, see (5.1.1), 
is a typical example of a superstable potential. However the potential y = 0, 
the free gas model, is not superstable (although it is trivially stable), see 
[Ru70] for a general theory of such potentials. 

Let V be a cubic volume and consider the element p of the grand 
canonical ensemble with parameters (8, À) and with particles confined in 
V: 8 =1/keT, kg = Boltzmann’s constant and T = temperature, À = 
chemical potential, see §2.5. The probability of finding n particles in the 
microscopic state dp, ...dp,, dq, fbi dq, in the distribution p(#%V) is: 


BA,V) 


—B(E(p.9+An) dp ...dp dq, ...dq 
B,À,V — e —1 =n =1 =n 
ut (dp, a dq, ) = san y eee _ — (5.3.2) 


where E(p,q) = T(p) + ®(g) = D}, p?/2m + (g,,...,q.) and È is the 
grand canonical partition, see §2.5. We want to take the limit of (5.3.2) as 


5.4.1 


5.4.2 
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V — œ and interpret it as a probability distribution on the infinite system 
configurations that one reaches in this way. 

One begins by giving a precise definition of the infinite system configura- 
tions; then comes the problem of giving a meaning to the limit as V — oo 
of (5.3.2) and finally one will want to characterize the distributions that are 
found by following this limiting procedure starting from (5.3.2), or start- 
ing from the more general grand canonical distributions with fixed external 
particle boundary conditions; the latter were introduced, as generalization 
of (5.3.2), in §2.5: see (2.5.2). 


85.4. Phase Space in Infinite Volume and Probability Distribu- 
tions on it. Gibbs Distributions 


It is natural to define the phase space M in infinite volume as the space 
of the sequences (p,q) = (p,,9,)721 of momenta and positions such that 
in every finite (cubic) volume there are only finitely many particles, called 
locally finite configurations: if we consider systems of particles with hard 
core of diameter ro > 0 this will be “automatic”, as the only configurations 
q that we have to consider are those with |q, — g;l > ro, fori £ j. 

However, to take into account microscopic indistinguishability the configu- 
ration space will not be M, but the space M obtained from M by identifying 
sequences (p, q) differing by a permutation of the particles. 

A probability distribution on M is in general defined so that the following 
question makes sense: what is the probability that by looking in a given (cu- 
bic) volume V° one finds in it exactly g particles with momenta in dp, TE dp, 
and positions in dq, - -.dq,? 

Therefore the BRR. distribution u will be characterized by the func- 
tions fyo (P eo Bp dp 14) such that the quantity: 


dp, ...dp, dg,...dq, 
fvo (peo Bas Qyy 09.) ———— —"<* (5.4.1) 


g g! 


is the probability just described. The functions fyo will be called the local 
distributions of u: the factor g! could be included in fyo, but it is customary 
not to do so since the particles are indistinguishable and this factor simplifies 
combinatorial considerations. 

By using the functions fyo it will be possible to evaluate the average value 
of a localized observable, localized inside the volume V®: this is, by def- 
inition, a function on phase space that depends on (p,q) € M only via 
the state of the particles located in V°. Adopting the convention that 


d ; ; 
{Phos {aha À p., Poe if F is such a local observable we can 
write its average as 


Abo Age {abo (5.4.2) 
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Consider a probability distribution like (5.3.2) describing a particle system 
enclosed in a “large global” volume V that we suppose cubic. With a fixed 
V? C V (we think here of V as huge and V° as much smaller), one can 
compute the probability that inside V° the configuration (p, q) will consist 
in g particles in dp, 2 - dp dq, ...dq_. Once the appropriate integrals are 


performed one will find necessarily an expression like 
dp,...dp_ dq,...dq 
(V) =1 =g +1 =9 
fyo C SE 
It is then natural to define the limit as V — oo of the probability distribution 
AV), (5.3.2),3 as the distribution y on M characterized by the local 
distributions 


(5.4.3) 


fvo(p,, Pod 14) = jim Os eo Pptp od ) (5.4.4) 
provided the limit exists for each V°. 

It can be shown that if the interparticle potential p is superstable, see 
(5.3.1), and hence a fortiori if it has a hard core, then the limit (5.4.4) exists, 
at least along subsequences of any sequence of volumes V with V — oo. In 
the hard core case this is an almost obvious “compactness argument” (i.e. a 
“free” argument based on abstract nonsense).4 

The same remains true if (94 V) is replaced by a more general element of a 
grand canonical ensemble with fixed external particle boundary conditions, 
provided the external particle density “does not grow too fast with their 
distance to the origin”, see §2.5. 

The latter condition means that, fixing a length unit £ (arbitrarily), the 
number n(A) of external particles in a box A with side size £ does not grow 
too fast with the distance d(A, O) of A from the origin, e.g. it satisfies 

n(A) 
A, 0) AO 0. (5.4.5) 
This condition is automatically satisfied if the interaction has hard core; if 
it is not satisfied then it is not difficult to find a configuration of external 
particles such that the above limit does not exist (or is “unreasonable” ).5 


When one imagines the volume of the “global” container increasing to co, keeping the 
size of the region V® that is under scrutiny fixed. 


On the contrary it is highly nontrivial, when true, to prove the existence of the limit 
V — œ without restricting V to vary along a “suitable” subsequence. 


Consider in fact a system of particles interacting via a Lennard-Jones potential, (5.1.1), 
which is < —b for distances between a and 2a. Let V be a cubic container and distribute 
outside V, at distance exactly a, M = N° external particles, c > 2/3. Inside the volume 
V suppose that there are N particles with N = pV, p > 0. With such boundary 
condition the canonical distribution has a thermodynamic limit in which there are no 
particles in any finite region with probability 1 (i.e. fyo = 0 for all V°). In fact one 
checks that, putting all N internal particles in a corridor of width a around the boundary, 
one gets a set of configurations with energy < O(-NN¢CV~2/3) and phase space volume 
O((aV2/3)N N!-!). Then a comparison argument similar to those of §4.1 to study the 
various catastrophes applies. Likewise one can discuss the corresponding example in the 
grand canonical ensemble (or the microcanonical). 


5.4.6 
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Therefore we can define the set of Gibbs distributions on the phase space M 
as the set of all possible distributions that are obtainable as limits of conver- 
gent subsequences, in the sense of (5.4.4), of grand canonical distributions 
uF) with periodic boundary conditions or with fixed external particle 
boundary conditions whose density does not grow too fast at infinity in the 
above sense. 

The distributions that are obtained in this way will define the equilibrium 
phases of the system (see 82.5) and are not necessarily invariant under trans- 
lations, i.e. such that for every displacement € € R3: 


fvore(p,,...,p 54 +6:...,q, +6) = fyo (Pis +++ Bgr Gyr +2 dy) (5.4.6) 


Except in the (important) special case in which periodic boundary coondi- 
tions are used, translation invariance symmetry is broken by the fact that 
the system is, before the thermodynamic limit V — oo, enclosed in a finite 
box V; and it is not necessarily true that the invariance is “restored” by the 
mere fact that we send V — oo. 

The physical phenomenon related to the above (possible) spontaneous 
breakdown of translation symmetry is the possibility of the existence of 
thermodynamic states in which pure phases coexist occupying, for instance, 
each half of the total space allotted to the system, being separated by a 
microscopically well-defined surface: one should think here of a liquid in 
equilibrium with its vapor. 

Therefore we shall distinguish the set of Gibbs distributions G°(8, À) from 
its subset G(B, À) C G°(B, À) consisting in the distributions which are invari- 
ant under translations, i.e. which have local distributions satisfying (5.4.6). 

If u is a translation-invariant probability distribution on M and if S = 
(S1, S2, 53) are the translations by one length unit, in the three directions 
(ie. Sa(p,,9,)21 = (p,,4, + e7) with e*, a = 1,2,3 being the unit vector 
in the a-th direction), then the triple (M, S, u) is, according to a well es- 
tablished terminology, a dynamical system; this is a useful fact to bear in 
mind, as we shall see on several occasions. 

One could, of course, define the Gibbs distributions by starting from distri- 
butions of the canonical ensemble (or microcanonical or any other orthodic 
ensemble) with fixed external particle boundary conditions. 

By so doing one would generate the problem of the equivalence of the en- 
sembles (see Chap.II) in the sense that one should show that the totality of 
the Gibbs distributions on M built starting from the grand canonical en- 
semble distributions with fixed external particles boundary conditions does 
coincide with the totality of the Gibbs distributions (on M) built start- 
ing with microcanonical or canonical distributions with periodic or fixed 
external particle boundary conditions. 

The analysis of the latter question is difficult: it is essentially complete only 
in the case of hard core systems, [Do68a],[Do72], [LR69]; but it is somewhat 
incomplete in the “general” case of superstable potentials, see [Ru70],{La72], 
[Ge93]. Nevertheless there is no evidence that there might be conceptual 
problems on such matters. 


5.5.1 


5.5.2 


5.5.3 


5.5.4 


5.5.5 


V. Phase Transitions 153 


85.5. Variational Characterization of Translation Invariant Gibbs 
Distributions 


If we restrict our attention to the translation-invariant Gibbs distributions 
u € G(G, À), also called homogeneous phases, then an alternative and inter- 
esting variational characterization of them is often possible. 

The first simple remark, that stems immediately from (5.4.4), or from 
its variants with different fixed external particles boundary conditions, 
is that if u € GV(B,X) then the momenta distribution is Maxwellian, 


ie. fve (P> eo Pp dp 14) can be written as 
g 
—B Dp /2m 
e i=1 = 
fvo(p,,...,p 4... j= ig fv poa) (5.5.1) 


a \/2rmZ-t 


where the factor in the square root is introduced because it provides an ob- 
viously convenient normalization, making f a quantity with the dimension 
of an inverse length to the power 3g since the quantity 


7(B) = V2nmp-* (5.5.2) 


is a “momentum”. Sometimes one defines instead (8) = \/2am6-th-? 
including in it also the factor h~°9 that appears in (2.2.1); with this choice 
fv, would be dimensionless. 

The probability distributions on phase space M with local distributions 
that depend upon the momenta as in (5.5.1) are called Maxwellian distri- 
butions. The problem is therefore that of characterizing fyo so that the 
distribution defined by (5.5.1) is in G(B, À). 

Going back to a finite total volume a well-known argument shows that 
(5.3.2) satisfy a variational principle. More precisely let (p,q) abbreviate 
(Pio Pp dpo dp) and write (5.3.2) as Éd 


FD... Pw dpo ap) dP; o dp, da, --- dg, = fn(p,g)dpdg (5.5.3) 


and set 
n 2 


En, 9 = D g T odn) = Talp) + Sr(a). (5.5.4) 


i=l 


Then consider the functional J(f) = S(f) — BU(f) — BAN(f) defined on 
the functions f: 


1 dpd 
NZD J -Ae q) log fn (p, q) Ea 
oo dpd 
= oy | ine, q) (En (p, g) + an) E == ERA 
n=0 | 
co dpd 
=- => vi fn(p, g) (log fn(p, a) + BEn (p, a) + Bàn) = =. 


5.5.6 


5.5.7 


5.5.8 


5.5.9 


5.5.10 


154 V. Phase Transitions 


By the Lagrange multiplier method one checks that J(f) is stationary (ac- 
tually a maximum) on the set of the f > 0 such that 


DA (p,q 


x d 
EI (5.5.6) 


if, still with the notation (p,q) = (D: cs P sure. 4.) the functions fn 
satisfy: 


(— log fn(p, q) — B(En(p, q) + An)) = constant (5.5.7) 


i.e. if f is given by (5.3.2). 

It is natural, at this point, to introduce the space of all translation invariant 
distributions u on M that have a Maxwellian momentum distribution and 
to define on this space the following functionals: the “specific volume”, the 
“total energy” and the “potential energy” corresponding to the interparticle 
potential y, and the “entropy”. 

We denote such functionals by v(u),u,(u), ūp(u) and s(u), respectively, 
and we write them first in the general case and then we shall consider the 
expression that they assume when the fyo have the Maxwellian form (5.5.1). 

To simplify we also abbreviate the notation for the local distributions in 
the volume V®, see (5.5.1), as 


Des Dale) 
ls nag Pee (5.5.8) 
volay) 


where (p,q) stands for (Poo Py dyed) and dpdg = dp.. -dp, 


dq j$ --dq, . Then the specific volume of u will be defined by 


= u 


= im >. fortes 
= lim o LS fauve 
g=0 
With the notations in (5.5.4) the total energy will be 
pes dp dq 
ust) = Jim 75 | TO + 841) feral = 


: i 3 = dq 
n > [Spat Pol) Foot = (5.5.10) 
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where in the first step the gaussian, hence trivial, integrals over the momenta 
p have been performed explicitly and in the second step we used (5.5.9). 
Likewise the thermodynamic entropy is: 


l LE away 
s(u) D. -yo D J feal q) log fvo,g(p, der = 
ie e BTs) _ 
= tim ef ae Fr 
Voo yo 2 V2rmp Th 2 g\t 
re = q dp 
(—BTy(p) — F log(2m6~*h?) + log Fy g()) = 
=m - 7d | fve aD ayaa 
SUR 1-2 = 
; G = => los (2am h=?) + log Fyo „(4)) = 
= — y(y) 7! log(2remB th 2)? + 3(y). 
5.5.11 (5.5.11) 


All limits above do exist in the case of systems with hard core potentials: 
to prove this the techniques are similar to those used in Chap.IV to discuss 
the existence of the thermodynamic limit. The limits, however, exist under 
much more general conditions that we shall not discuss here. 

We now maximize, on the space of the Maxwellian translation-invariant 
distributions u on M (with inverse temperature 8, i.e. having the form 
(5.5.1)), the functional: 


5.5.12 s(n) — BAv(u)~* — Bus (y) (5.5.12) 


and let 3 p(3,) denote the supremum of (5.5.12). 

We proceed by quoting only results that are valid in the case of hard core 
systems, to avoid discussions on the more general superstable case (for which 
similar, but less satisfactory results can be obtained), the general discussion 
being somewhat technical, [Ru70]. If p has a hard core one has: 


5.5.13 Bp(B,A) = ae (s(u) — Bdv(p)7* — Buy (H)) (5.5.13) 


and the maximum is reached exactly on the translation-invariant Gibbs dis- 
tributions u € G(B, À), and only on them; see [Do68a],[LR69]. One can 
check that the meaning of the maximum value p(8, À) is that of “pressure” 
(leaving aside mathematical rigor this is, in fact, quite clear from the dis- 
cussions in 82.5 and above). 

The variational property (5.5.13) has been heuristically based on the men- 
tioned check (see (5.5.5),(5.5.7)) that the functional (5.5.5) leads, in a finite 


6 In the quoted papers one considers lattice systems, see 85.10 below, but the techniques 
and results can be extended to hard-core systems quite straightforwardly. 
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volume container with “open boundary conditions” (i.e. no fixed particles 
outside the container), to the element (8, À) of the grand canonical ensem- 
ble. However a remarkable feature of (5.5.13) is that nevertheless it happens 
that the solutions of the maximum problem (5.5.12) contain, as well, the 
translation invariant Gibbs distributions that can be obtained by imposing 
general fixed external particle boundary conditions and subsequently con- 
sidering the thermodynamic limit of the distributions so obtained, [Do68al, 

[Do68b], [D072], [LR69]. 

It can be checked (this is a simple but nontrivial theorem) that the set of 
translation-invariant probability distributions that realize the maximum in 
(5.5.13) (i.e. the set denoted G(, À) of the Gibbs distributions with inverse 
temperature @ and chemical potential À) form a convex set (i.e. p1, H2 € 
G(B, A) implies ayy + (1 — a)u2 € G(B, À) for all a € (0,1)). Furthermore 
the convex set is actually a simplex, i.e. such that every u € G(B, À) can be 
represented uniquely as a convex superposition of extremal distributions in 
G(B, À). 

The statistical interpretation of a (convex) superposition of two probability 
distributions is that of a mixture: hence the meaning of the latter described 
property is interesting. It says, in other words, that if the extremal distri- 
butions of G(B, À) are interpreted as the pure homogeneous (i.e. translation 
invariant) phases, then all the other elements (“homogeneous phases”) in 
G(B, À) are mixtures of pure phases and they can be represented as such in 
a unique way. 

For instance if G(3, À) contains only two extremal elements u, and u—, the 
first representing the “liquid phase” and the second the “gaseous phase”, 
then every other distribution in G(B, À) can be represented as au+ + (1 — 
a)u— with 0 < a < 1, and a has the interpretation of fraction of mass of the 
liquid phase. 

It is remarkable that it is possible to prove that the extremal states u of 
G(B, A) enjoy the property of ergodicity in the sense that the above defined, 
see 85.4, corresponding dynamical systems (M, S, p) are “ergodic” and they 
are the only points in (M, S, u) with this property, see [Ru69]. 

The ergodicity property is the natural generalization of the notion intro- 
duced in the discrete evolution cases of the systems in Chap.I. We consider 
a family of commuting invertible transformations” S = (S1,..., Sn) acting 


7 We shall only consider here and in the rest of the book measurable transformations, 
measurable functions, measurable sets. These are rather delicate notions, on the brink of 
the imponderable because to find nontrivial examples of nonmeasurable corresponding 
objects one needs the sinister axiom of choice. However if one wants to discuss notions 
like ergodicity in systems that are not regarded as discretized, abandoning Boltzmann’s 
wise discrete conception of the world, one must say a few words on measurability. The 
spaces M, M’... that we consider here and later will all have a natural notion of “close- 
ness” bewteen points, a topology in Mathematics: typically a metric can be defined 
on them (this metric can be defined but often it is not really useful so that it is not 
always explicitly defined as there is little doubt about what it could be). Therefore it 
makes sense to define open sets. One declares all of them measurable: more generally 
the smallest family of sets that contains all the open sets and that is closed under the 
operations of countable union, complementation and intersection is by definition the 
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on a space M so that setting SE = Sf. S$2..... SEn, with k = (ki,...,kn) 
an n-ple of integers, we can define SEx for x € M. We say that a probability 
distribution u on M is S-invariant if for every measurable set E C M it is 
U(SÈE) = u(E). The triple (M, S, u) is called a discrete dynamical system 
and SE is called a “translation of k by S” (here S? is the identity map), 
[AA68]: 


Definition (ergodicity): Let (M, S, u) be a discrete dynamical system; it is 
“ergodic” if there are no nontrivial constants of motion, i.e. no measurable 
functions x — F(x) on a phase space M which are invariant under “trans- 
lation by S” and which are not constant as x varies excluding, possibly, a 
set of zero u-probability. 


The definition can be extended in the obvious way to the case in which 


S is a continuous flow, i.e. k € R” and S} are commuting transformations 


which satisfy the group property S rs? = oF for all h,k € R and S9 is 
the identity map. 

The above dynamical system with M being the phase space points of an 
infinite system, with the “evolution” S being the spatial translations and 
with u being a Gibbs state, is ergodic if one cannot find observables that are 
translation invariant and at the same time not constant (outside a possible 
set of zero -probability ). 

If we sample the system configurations from an ergodic distribution u we 
must find that the translation-invariant observables always have the same 
value. Thus for instance the global density (or specific volume) will always 
have the same value on all configurations that are sampled with distribution 
L. 

We see that because of the above-quoted theorem of unique decomposabil- 
ity of the elements of G(B, À) into extremal distributions the extremal points 
of G(B, À) deserve the name of pure phases as there is no way to see that 
they consist of different configurations by measuring global, translation- 
invariant, properties that they enjoy. 

If, instead, a probability distribution in G{(5, À) is not pure but it is a 
mixture of, say, two pure states with different densities and with coefficients 


family of measurable sets, or Borel sets. A mesurable transformation S is a map of M 
into M’ such that S~1E is measurable for any measurable Æ. Therefore it makes sense 
to say that a function (i.e. a map of M to R) is measurable. A probability distribution, 
or a “normalized measure”, on M is a function defined on the measurable sets with 
values u(E) > 0 and which is additive (i.e. if E = U En and the En’s are pairwise 
disjoint, then u(E) = 5 u(En)), and such that u(M) = 1. Given a distribution u on M 
one calls -measurable any set in the smallest collection of sets, closed under countable 
union, complementation and intersection, that contains the measurable sets as well as 
any other set that can be enclosed into a measurable set with O u-measure: the latter 
are called “0 -measure sets”. Likewise we can define -measurable functions and p- 
measurable maps. The above notion of z-measurability should not be confused with the 
previous notion of measurability. Why the name “measurable”? because a measurable 
function of one veriable is the most general function for which it is possible to set up, 
in principle, a table of values, i.e. a function that can be approximated by piecewise 
constant functions. 
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a and 1 — a then by sampling the system configurations we may get either 
a configuration of the dense phase or one of the rarefied phase; so that 
the global density, which is a translation-invariant observable, can have two 
distinct values (each with a probability of occurrence in samples given by a 
and 1 — a respectively), so that it is not constant. 

An important general theorem for dynamical systems is Birkhoff’s ergodic 
theorem, [AA68]: 


Theorem (Birkhoff): Let (M, S, u) be a discrete dynamical system and let A 
be a cube of side L. Then for all -measurable functions f on M the limit 
limy TA] Duke A f(S£x) = f(x) exists apart, possibly, from a set of x’s 
of zero u-probability. Hence if (M,S,u) is ergodic f(x) is a constant for 
p-almost all x and, therefore, it is equal to fy; u(dy)f(y)- 


The above statement can be also formulated for the case in which S is a 
flow (i.e. k € R” is a continuous vector) and it is also a valid statement. 
Another consequence of the ergodicity is that particles located in two far 
apart cubes are observed as if they were independently distributed, at least on 
the average over the boxes locations. This is also a property that intuitively 
should characterize the physically pure homogeneous phases.® 


85.6. Other Characterizations of Gibbs Distributions. The DLR 
Equations 


Via the variational principle (5.5.13) one finds all the translation invariant 
Gibbs distributions, but on physical grounds, as remarked, we expect that 
there may also exist, under suitable circumstances, nontranslation-invariant 
Gibbs distributions; i.e., with the notation of 85.4, in general we shall have 
that G°(3,) contains G(B, À), but it does not coincide with G(B, À). 

Therefore it is useful to look also for other characterizations of Gibbs states 
which do not “discriminate” the nontranslation-invariant states. Such a 


8 A simple abstract argument proves the statement. Let u € G(B, À) be ergodic and 
denote by p(A) the average over the configurations x of the number N(x) of particles 
in the unit cube A with respect to the distribution y; let p(A, A’) be the average of 
the product of the number of particles in the unit cube A times that in the unit cube 
A’. The translation invariance of u implies that p(A) is independent of the location 
of the unit cube A and that p(A, A’) depends only on the relative position of the unit 
cubes A and A’. If A is a large volume paved by unit cubes A the average number 
of particles in A will be (A) = AASA p(A) and the average En) will be 
RE AA’? p(A, A’). Given a configuration x, the limit p as A — œ of ye 
will exist, possibly outside of a set of configurations with u-probability 0: since this 
limit (when it exists) is obviously translation invariant as a function of x, it must 
be a constant (possibly outside a set of 0 probability, by Birkhoff’s theorem above; 

N(x)(N(x)=1) 

[A 


nonconstancy would be against ergodicity). For the same reason also will 


2 
have a limit equal to that of en which has to be constant and therefore equal to p?. 
Hence FE DANI p(A, A’) — (TT D À p(A))? jouw 0 which means that p(A, A’) = 


p(A)p(A’) “on the average over A, A”. 


5.6.1 


5.6.2 
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characterization is possible and is suggested by a heuristic argument based 
on the finite volume grand canonical distribution p(%%V) without external 
fixed particles, (5.3.2). 

We ask: given V° C V what is the probability of finding inside V° exactly 
g particles in the positions q vd, knowing that out of V? the particles 
are located in the positions Gs CR en 

Denoting by fyo(q,,--- 14, [gi 4...) = fvo,(alg) the density of this 
conditional probability, where q abbreviates (q,,..., q,) and q’ abbreviates 
(q,,,---), then it is immediate to deduce from (5.3.2) that: 


-y((8)89¢L- 29-88, »-4,)-B Ei Dye Ua, —21)] 


normalization 


Fvog(ala’) = (5.6.1) 


where (3) = (2mm@-1)29 (see (5.5.2)) and the normalization is deter- 
mined by imposing the condition that Tv defines a probability distribution, 
i.e. that: 


f q\q¢’) —=—— = 1. (5.6.2) 
2 veg(ala PEL 
This relation depends on the total volume V only because To LA ..., i.e. the 


particles of the configuration external to V°, are constrained to be in V, 

i.e. in the global container of the system (and outside V°). 

It is therefore natural to define, as an alternative to §5.3, §5.4, a (infinite 
volume) Gibbs distribution on M with parameters (8, À) as a distribution 
u on M Maxwellian in the momenta and for which the probability for the 
event in which the particles inside a fixed finite volume V° are in dr 


conditional to knowing that the particles outside the box V® are in T> VA RET 
(with any momenta) is given by (5.6.1) without any restriction that the 
particles at T> Om ... be inside a larger container V (because the latter has, 
now, to be thought as infinite). 

This reading of (5.6.1) is known as the DLR equation and it was proposed 
as a very general definition of Gibbs state (in the thermodynamic limit) by 
Dobrushin, Lanford, Ruelle, [Do68], [LR69]. 

This is important because one can establish, quite generally, the theo- 
rem that Gibbs distributions, defined as the probability distributions on 
M which are Maxwellian in the velocities (with the same inverse tempera- 
ture parameter 3) and which satisfy (5.6.1), coincide with the distributions 
in G°(8,) defined via the thermodynamic limit in the previous sections, 
whether or not they are translation invariant. 

This is a theorem that holds as stated in the case of hard core systems; its 
validity in more general situations still presents a few technical problems to 
be understood although various weak versions of it exist in most cases of 
interest (e.g. in the case of superstable potentials), [LP76]. 
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85.7. Gibbs Distributions and Stochastic Processes 


By integration of the momentum coordinates p, the probability distri- 
butions u on the “infinite volume” phase space M define corresponding 
probability distributions on the space Mı of the position configurations 
of infinitely many particles g. Thus Gibbs distributions integrated over 
the momentum variables (which can be “disregarded” as playing a trivial 
mathematical role from the point of view of the description of the states, 
although they are physically very important), provide us with an interesting 
class of distributions on Mı which we shall still simply call Gibbs distribu- 
tions (rather than using a pedantic distinction between Gibbs distributions 
and configurational Gibbs distributions). 

In general the probability distributions u on Mı are known in probability 
theory as stochastic point processes because a point q € Mi in fact describes 
a family of points, i.e. particles located in q,,q,,--. in R?, ifq = CHLI gea) 

The remark permits us to give a new physical interpretation to several 
results of the general theory of point stochastic processes, and mainly it 
induces a translation of problems relevant for physics into interesting math- 
ematical problems in the theory of point stochastic processes. 

The issue that is, perhaps, central is to show that there exist simple choices 
of the interparticle potentials p, assumed with hard core for simplicity, 
and of the parameters 8, À for which the variational principle or the DLR 
equations admit more than one solution. 

This is the same as the problem of the existence of phase transitions in a 
homogeneous system of identical particles: in fact we have argued that the 
physical pure phases that can coexist can be identified with the solutions of 
the variational principle or of the DLR equations. 

We have seen above that the van der Waals theory provides us with an 
affirmative answer to this issue; however it is rather unsatisfactory and, 
to date, there is still no example that can be treated without uncontrolled 
approximations (i.e. without introducing ad hoc hypotheses at the “right 
moment”). The above nice thermodynamic formalism might be empty, after 
all: but this possibility is really remote, and it is certainly not realized in 
models that are somewhat simpler than the ones so far used for continuous 
gases: see Chap.VI and the recent breakthrough in [LMP98}. 

Other remarkable problems that arise in the theory of stochastic processes 
and, independently, in the theory of phase transitions are related to ques- 
tions of scale invariance. 

From experience and from the phenomenological theories of phase transi- 
tions not only does the hypothesis emerge that the liquid-gas transition re- 
ally takes place whenever the interaction potential y has, besides a repulsive 
core, an attractive tail, but also the hypothesis that such a transition has a 
critical point (Ac, Be) where the Gibbs distribution (and the corresponding 
stochastic process) u has special scaling properties, [Fi98], [BG95]. 

More precisely imagine that we pave the ambient space R? with a lattice of 
cubes Q£, with side L and parameterized by three integers n = (n1,n2,n3), 


5.7.1 


5.7.2 


5.7.3 
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so that the cube OF consists of the points with coordinates nL < £a < 
(nn +1)L, h = 1,2,3. 

Define the family of variables (i.e. “functions” on phase space, in proba- 
bility theory language, or “observables” in physics language) on on M: 


On = [(particles number in QZ) Hu), LE (5.7.1) 


where ô is a parameter to be chosen. 

One gets a stochastic process, i.e. a probability distribution on a space of 
states consisting of the sequences {0n} indexed by n € Z3, in which the 
“states at the site n” are real numbers labeled by n and defined by (5.7.1).9 


By these phenomenological theories of the critical point, we may expect 
that, in the limit as L — oo and if 6 is suitably chosen, the stochastic 
process describing the distribution of the variables øn tends to a limiting 
process such that, in the limit, the on can be represented as: 


On = (a) da (5.7.2) 
Qn 


where y(x) are random variables: this is a “stochastic process on R3” with 
homogeneous correlation functions; i.e. for every k and z1, 22,...,2 x: 


{d(x1)...#(æx))) = homogeneous function of (x1,..., £k) (5.7.3) 


if (-) denotes the operation of evaluation of the average value (also called 

“expectation” in Probability Theory) with respect to the distribution of the 
random variables w. 

Since no nontrivial examples of point stochastic processes with the above 
properties are known (or, better, were known until recently) one under- 
stands the interest, even from a purely mathematical viewpoint, of the the- 
ory of phase transitions which in its heuristic aspects provides a solution to 
various problems related to the existence and structure of stochastic pro- 
cesses. The heuristic results suggest in fact very challenging mathematical 
conjectures, and some ideas for their understanding (often only partial), so 
that the subject continues to attract the attention of many, [Wi83], [WF72], 
[Ga76], [Fi98]. 

It seems fair to say that the tumultuous development of statistical me- 
chanics and of the theory of phase transitions has literally revolutionized 
the theory of probability as well. 

We conclude by mentioning (we come back on this point later) that so far 
we have only discussed the properties of the Gibbs states as equilibrium 
states, but without ever introducing the dynamics. We have regarded them 


9 More generally one calls stochastic process a probability distribution on a space of states 
consisting of families of variables, called random variables, indexed by an arbitrary label; 
and both the label and the labeled variables can be in any space, a ghastly generality. 
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as dynamical systems with respect to space translations and we have seen 
that this leads to the mathematical definition of pure phase. One may 
wonder whether one could obtain similarly interesting notions by regarding 
the Gibbs states as stationary states for time evolution as well (i.e. for 
translations in time). This is a much harder question and we defer discussing 
it to Chap.IX. 


85.8. Absence of Phase Transitions: d= 1. Symmetries: d = 2 


(A) One dimension. After the above general analysis and after setting up a 
formalism well suited for our programs we return to more concrete questions. 
We begin by showing that, as already stated several times, one dimensional 
systems with finite range interactions cannot have phase transitions of any 
sort, unless it is considered in the somewhat unphysical situation of having 
zero absolute temperature. Again we limit ourselves to the simple case of 
hard core interactions and call rg the hard core size (so that y(r) = +00 if 
r < To). 

We shall use here as a definition of phase transition the presence of a hor- 
izontal segment in the graph of the pressure as a function of the specific 
volume at constant temperature. But other definitions could be used, e.g. 
the inequivalence of some of the ensembles and the dependence of the ther- 
modynamic limit on the boundary conditions, discussed in §2.5 and above, 
in the present chapter. 

Consider first the case in which the potential vanishes beyond r = 2ro 
where ro is the hard core radius: this case is particularly easy and is called 
the “nearest neighbor’ interaction case. 

It is best to use the pressure ensemble, see §2.5, (2.5.6), (2.5.16), with the 
volume V taking the continuum of values between 0 and oo. 

Then the partition function in the pressure ensemble, see (2.5.17), (2.2.1), 
is 


N-1 
Jn (G,p) 7? £(B) pS =| a e FPL . —Be(qjt1—-4) 
[0,L] ae IIe 


E (5.8.1) 


where (6)! F , /2am. 


We shall use the fact that the interaction cannot extend beyond the nearest 
neighbor and we label the particles 1,..., N so that qı < q2 < ... < qn. In 
this way we restrict the integration domain by a factor N!. Thus, extending 
the integral to the region 0 < qı < q2 < ... < qn < L we get rid of the 
N!-1 present in the definition of the partition function. 

The momentum integration yields the square root in front of the integral (it 
is raised to the power 1 rather than the usual 3 because the space dimension 
is now 1). The length Lo is an ns dimensional factor (see (2.5.17)). 


Then we note that L = qı + (oe (qi — qj)) +L— qn and introducing 


5.8.2 


5.8.3 


V. Phase Transitions 163 


the variables q;+1 — q; as independent variables, we see that: 


Jn (8,0) AO" | | e~Pme-Pe dg)” (5.8.2) 


so that the thermodynamic limit limy — oo + log Jn (B, p) is 


—B\(B,p) = log (Se al e` ÊPae- Bel ag) . (5.8.3) 


Equivalence between pressure ensemble and canonical ensemble is worked 
out along the same lines in which in §2.5 equivalence between canonical and 
grand canonical ensembles (hence orthodicity) were derived. 

One finds, as mentioned in §2.5, that the quantity p can be identified with 
pressure and (3, p) can be identified with the Gibbs potential u — Ts + pu 
(see (2.5.11)), and 6 = 1/kgT. Moreover the equation of state is derived 
(by using the orthodicity) from the thermodynamic relation (2) 8 =v. 

Relation (5.8.3) implies that the Gibbs potential À(6, p) is analytic in 6, p 
for B,p > 0; and it is strictly monotonic in p so that the relation (Se =v 
implies that pressure is analytic and strictly monotonic (decreasing) in v: 
hence the equation of state cannot have any phase transition plateau. 

The above analysis is a special case of Van Hove’s theorem, which holds 
for interactions extending beyond the nearest neighbor, see Appendix 5.A1 
below, and it played an important role in making clear that short-range 
one-dimensional systems cannot undergo phase transitions, [VH50]. Further 
extensions can be found in [GMR69]. 

If one adopts the definition of phase transition based on sensitivity of 
the thermodynamic limit to variations of boundary conditions one can 
give a more general, conceptually simpler, argument to show that in one- 
dimensional systems there cannot be any phase transition if the potential 
energy of mutual interaction between a configuration q of particles to the 
left. of a reference particle (located at the origin O, say) and one configura- 
tion q’ to the right of the particle (with qU O Ug’ compatible with the hard 
cores) is uniformly bounded. 

The argument, due to Landau, is simply that, in this case, the distribution 
of the configurations to the right of a point and to the left of it are essentially 
independent: hence by changing the configuration of fixed particles outside 
a box one does not alter appreciably the probability distribution inside it. 

This is so because the weight of a configuration q, consisting of a part UF 
to the left of the origin and of a part q, to the right of it, is the exponential 
of —@H, if H is the energy of the configuration. But the energy of such a 
configuration is a sum of two quantities (large, of the order of the volume 
occupied by the configurations) which are the sum of the energies that each 
of the two parts CA and CE would have, in the absence of the other part, 
plus the mutual energy. The latter is, NORE bounded independently of 
the choice of q, and do 
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In itself this does not immediately imply that there can be no dependence 

on boundary conditions because à finite ratio between two probabilities is 
not the same thing as a ratio close to 1: hence the argument has to be refined 
by considerations that show that it also implies actual boundary condition 
independence: I find it easier to just give analytic details on the above 
argument (see Appendix 5.A1 below) rather than indulging on heuristic 
discussions. The argument clearly shows a mechanism responsible for the 
“loss” of memory of the boundary conditions as one proceeds in from the 
boundary down to the center of a finite interval [0, L]. 

Hence the larger the box the smaller is the influence of the external particles 
on the bulk of the particles in the box: hence no inequivalence between the 
ensembles can arise, i.e. no phase transitions in that sense. One also says 
that no long-range order can be established in such systems, in the sense 
that one loses memory of the boundary conditions as the boundaries recede 
to infinity in the process of taking the thermodynamic limit. 

Note that the argument above fails if the space dimension is > 2: in this 
case even if the interaction is short ranged the energy of interaction between 
two regions of space separated by a boundary is of the order of the boundary 
area. Hence one cannot bound above and below the probability of any two 
configurations in two half-spaces by the product of the probabilities of the 
two configurations, each computed as if the other was not there (because the 
bound would be proportional to the exponential of the surface of separation, 
which tends to oo when the surface grows large). This means that we cannot 
consider, at least not in general, the configurations in the two half spaces 
as independently distributed. 

Analytically a condition sufficient to imply that the energy between a con- 
figuration to the left and one to the right of the origin is bounded above, if 
the dimesion d is d = 1, is simply expressed (as it is easy to check) by: 


e | y(r)| dr < +00 (5.8.4) 


ro 


One usually says, therefore, that in order to have phase transitions in d = 1 
systems one needs a potential that is “so long range” that it has divergent 
first moment. It can be shown by counterexamples that if the condition 
(5.8.4) fails there can be phase transitions even in one-dimensional systems, 
at least in further simplified models, [Dy69]. In fact very recently the first 
phase transition in a continuous system and in the absence of symmetry 
breaking has been proved to occur precisely in a system violating (5.8.4), 
see [Jo95]. 

The arguments of this section apply also to discrete models like lattice gases 
or lattice spin models, see 86.2 below and Chap.VI, Chap.VI, and 89.7. 


(B) Symmetries. By symmetry one means a group of transformations acting 
on the configurations of a system subject to some boundary condition (e.g. 
periodic or open) and transforming each of them into configurations with 
the same energy and with the same boundary condition. 
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Systems with “too much symmetry” sometimes cannot show phase transi- 
tions. This is best discussed if one uses as definition of phase transition the 
existence of long-range order. 

The latter is defined by considering a localized observable F (see (5.4.2)) 
which has zero average (F) in all Gibbs states obtained as thermodynamic 
limits, with suitable boundary conditions (e.g. periodic or open), because 
of their symmetry properties. 

Suppose that the average value (F - te F’) of the product of the observable 
times a translate?® by € of itself, a quantity called the spatial autocorrelation 
of the observable, does not approach 0 as € — oo. Then one says that the 
system shows long-range order for the order parameter F. 

The symmetry is continuous if the group of transformations is a continuous 
group. For instance continuous systems have translational symmetry if con- 


sidered with periodic boundary conditions, so that the number na of parti- 


cles in a small box A is a local observable and such is also va da (na — Ta) 


where 7, is the average over translations. For symmetry reasons this quan- 
tity has zero average in the Gibbs states associated with the Hamiltonian 
describing the system with the symmetric boundary condition (periodic in 
this case). Denote by A +€ the box A’ obtained by translating A by a vec- 
tor € and let (-) denote the average in one element of an orthodic ensemble 
(i.e. an average with respect to a Gibbs state). The system is said to show 
long-range order if the autocorrelation function at distance € is a function 

(VAVA+e) Which does not tend to zero as E — 00. 

Note that failure of convergence to zero of (VAva+e), see also §5.5, footnote 
8, is precisely what we expect should happen if the system had a crystalline 
phase (in which case the (vava+e) should show an oscillatory behavior, 
in €, of the correlation function). One can also prove that long range or- 
der of some observable implies that the derivative of the pressure (or of 
other thermodynamic functions) with respect to suitable perturbations of 
the energy function has a discontinuity, so there is an intimate connection 
between phase transitions defined in terms of long-range order and in terms 
of singularities of thermodynamic functions. 

As an example of an application of a general theorem, the Mermin- Wagner 
theorem, [MW66], [Mc67],[Ru69], one can state that if the dimension of the 
ambient space is d = 2 then a system which in periodic boundary conditions 
shows a continuous symmetry cannot have any local observable whose aver- 
age vanishes and whose autocorrelations at distance € do not tend to zero as 
the distance € — oo. This theorem is the first of a series of similar theorems 
based on an important kind of inequality called the infrared inequality and 
it has led to developments that solved several long-standing problems, see 
for instance [Fr81], [DLS78]. Here we choose not to enter into more details 
in spite of the great importance of the technique and the reader is referred 


10 A translate by the vector £ of an observable F is defined as the observable TgF such 
that Tr F(x) = F(x +£) where x + € is the configuration obtained from x by translating 


by € all particles positions, leaving the velocities unchanged. 


5.9.1 


166 V. Phase Transitions 


to the literature. 

The limitation to dimension d = 2 is, however, a strong limitation to 
the generality of the theorem and very seldom does it apply to higher- 
dimensional systems. More precisely systems can be divided into classes 
each of which has a “critical dimension” below which too much symme- 
try implies the absence of phase transitions (or of certain kinds of phase 
transitions), see [WF 72], [Fr81], [Fr86], [Fi98]. 


85.9. Absence of Phase Transitions: High Temperature and the 
KS Equations 


There is another class of systems in which no phase transitions take place. 
These are the systems so far considered (with stable and tempered interac- 
tions, see §2.2) in states with high temperature and low density. 

We use here as definition of phase transition that of a singularity in the 
equation of state, although in the cases below one could show that phase 
transitions do not occur even in other senses (like persisting sensitivity to 
boundary conditions as the boundaries recede to oo). 

One can easily show the absence of phase transitions for 871 and v large 
by showing that the equation of state is analytic. In fact in such regions 
the virial series, (5.1.16), is convergent and we have analyticity in v-! and 
B of the equation of state. 

There are two ways of attacking the problem: one is rather direct and 
looks for an algorithm that constructs the coefficients of the virial series. 
The algorithm can be found quite easily: but the k-th order term results as 
a sum of very many terms (a number growing more than exponentially fast 
in the order k) and it is not so easy (although it can be done) to show by 
combinatorial arguments that their sum is bounded by c(B) if 3 is small 
enough, [Gr62],{Pe63], see also equation (4.2), p. 176 in [GMM72], dealing 
with a case only apparently different and in fact more general. 

The other approach is somewhat less natural but it leads quite easily to 
the desired solution. It attempts to solve a much more general question. 
Namely the problem of computing the functions fyo of §5.5 and (5.5.1), 
i.e. “all the properties” of the system. 

We consider a gas in a cubic container V and with an interaction potential y 
satisfying (4.1.1). The state of the system, in the grand canonical ensemble, 
can be defined in terms of the local distributions discussed in §5.4, (5.4.1), 
or in terms of the more convenient (“spatial or configurational” ) correlation 
functions 


pv (qı, ores ; An) = 
= 1 > yntm I eT LP dr Yt y+ Ym) dyı ne . dym (5.9.1) 
Ev (6, À) með m! 


where z = e~9(\/27m-!h-?)? is called the activity: it has the dimension 
of a density i.e. of a length? as we included in it also the factor h~°" which 
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in (5.5.1) was included in fy,. The correlation functions are, therefore, the 
probability densities for finding n particles at the positions q1,...,Qn with 
any momenta and irrespective of where the other particles are. The square 
root comes from the integration over the momenta variables (which drop out 
of the scene, with no regret as they play a trivial role in classical statistical 
mechanics). The integral over the y’s is over the volume V. 

The energy ®(q1,.--,@n,Y1,---;Ym) can be decomposed as: 


m 


P1(q1; G2, a8 Yo) + 5 plq TE yj) + P(q, «13an; Yl; Ym) (5.9.2) 
j=1 


where ®1(q1;q2,...,Qn) = X; (qi — qi) is the energy of interaction of 
particle qı with the group of (q2,..., qn). 

We can imagine that q is the “most interacting particle” among the 
(g1,...,Qn), ie. it is one that maximizes the potential energy of interac- 
tion with the group of the other particles. Since ®(q1,...,@n) > —Bn by 
stability, this implies that 


1 (915 90,---54n) > —2B. (5.9.3) 


In some special cases the selection of qı among the n particles q1,...,@n may 
be ambiguous; the choice then can be made arbitrarily (for the purposes of 
the following argument). 

Then from the definition (5.9.1) and by using the decomposition (5.9.2) 
and if Ey denotes the grand canonical partition function in the volume V, 
we see that we have the following simple algebraic identities 


1 . = 
pv(q, Sat In) == gzebi (G1 id2 dn) 5 gelti; (5.9.4) 
V m=0 
fer glay) —BD(qD nd sium) YL ++ Ym = 
m! 
1 OO 
= — ze PPi(qiq2.….qn) y yn-ltm, 
=y m=0 
4 —By(qi—y;) —BE( ) dy +: dym 
. II (1 + (e @(gi-Y;) _ Je Qostanay iecere l = 
f m! 
j=l 
1 OO m 
== — ze PP1(915425--14n) 5 D a 5 
= m=0 s=0 teen 
dy; .. -dym F 
Pp Ai —Be(qi-vi,) — —BD(q2,...,n V1... Um) 
J a 
k=1 


having developed the product in the second line. By using the symmetry 
in the y; variables we can suppose that the j1,...,js are in fact 1,2,...,s 
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and rewrite (5.9.4) as 


pv(q; See dn) = 
OO 
= i 2e78P1(a1:42.:4n) 5 z5 dy -dys 
=v so s! 
S CO / / 
II (ER n 1) >» a dyi na dy, e 82l anes Yl yes YoY yo Yh) 
t! 
k=1 t=0 
(5.9.5) 
having in the last step called m = s + t and replaced, with the ap- 
propriate combinatorial factors required by the change, (y1,...,Ym) with 
(y1,...,Ys,Y1,..., y). Hence we see that the integrals reconstruct the cor- 
relation functions and (5.9.4) becomes 
co 
dy,...d 
Pv (di-n) = ze PPi(g1:q2..,qn) > Us, 
s=0 i 
s (5.9.6) 
g II (a jdi 1) pv (a2, ces Ons V1... Ys) 
k=0 
in which the term with s = 0 has to be interpreted as: 
—BD:1(g1:92:..,qn) o. ifn>1 
2€ pv (Q2, nly | (5.9.7) 
z if n= 1 


and all the variables q, y are considered to be in V. 

Relations (5.9.6) are called the Kirkwood-Salsburg equations: they are im- 
portant because we can use them to show that the virial series converges 
for 5 and v small. And in fact they allow us to obtain a complete theory of 
the gases in such regimes of B, v. 

We can regard py as a sequence of functions “of one, two,... particle po- 
sitions”: py = {pv(q1,.--,qn)}n;qı,... vanishing for q; ¢ V. If we define 
the sequence ay of functions of one, two,... particle positions by setting 
ay(q) = 1if qi € V and ay(gi,...,qn) =0ifn>1lorn=1,q;#V,then 
we can write (5.9.7) as 

pv = zay + zKpy (5.9.8) 


where, if 6,31 = 0 for n = 1 and 0,51 = 1 for n > 1, 


K pv (q1; -< <, qn) = e PF.) (ov (a, syle Ôn>1+ (5.9.9) 
= dy,...dYs $ 7 7 

yf II (e pelny) — 1) pv (Q2, ces Ans Y1:-.. us) 
s=1 k=1 


which shows that the Kirkwood-Salsburg equations can be regarded as linear 
inhomogeneous “integral” equations for the family of correlation functions 
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that describe a given system in the box V. The kernel K of these equations 
is independent of V. 

We should note that the quantities in py have different physical dimensions. 
In fact py (q1,---,Qn) has the dimension of a length to the power —3n. 
This sounds bad enough to wish to write (5.9.9) in dimensionless form. 
For this we need a length scale, and a natural choice is the “range” of the 
potential that could be defined as fae But a more convenient length 


that we can associate with our system is the quantity r(B): 


r(8) = I, le“ PP) _ 1Jd8q (5.9.10) 


which can be called the effective range at inverse temperature 8. Note 
that r(8) Far l if the potential has no hard core; if the potential has a 
hard core with radius a, and it is smooth and bounded otherwise, then 
r(@)3 30" a’. 

The length r(8) can be used to define the dimensionless correlations 


Py (d1; ---, qn) as: 


Py(q;...,Qn) = r(8)”"pv(g,.….,qn) (5.9.11) 


and setting Ç = zr(3)°, the above equations can be written in dimensionless 
form: 


Py = Çav + CK py, (5.9.12) 
with 


(5.9.13) 
. | oa Il en — 1) el: qn, Y1; -< -3 Ys)) - 
Then we can write the recursive formula: 
By = ay + Kay +K av + CR ay +... (5.9.14) 


which gives us an expression for the correlation functions, provided the series 
converges, of course. 
The convergence of the series is easily discussed if one notes that 


= Z dyı . . . dys 

Rav (a: dn)| < a i Has, 

| av (qı, 4 I se > Ra s!r(B)35 

(5.9.15) 


s 
= 2s — 1 

-[[ le Pela ve) _1||K? av (q2; -< +5 Ons Y1,- -> Ys )| 

k=1 
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so that if we call M (p) = maXn; qı,... qn |KPaV (q1, .--,qn)| we see that 


M(p) < PP M(p—1)(1+ > s!) = e28BH M (p — 1) (5.9.16) 


s=1 


and M(0) has to be set equal to 1. This implies that M(p) < e@88+))P so 
that the series (5.9.14) converges if |z| < e~ 288+ r(a)-3. 

The convergence is uniform (as V — oo) and (K” Jay (q1, .. -, qn) tends to a 
limit as V — œ at fixed qg1,...,qn and the limit is simply (K’a)(n, #0) 
if @(q1,...,Qn) = 0 unless n = 1, and a(q) = 1. This is because the kernel 
K contains the factors (e~8?-%) — 1) which will tend to zero for y — oo 
not slower than |y(y)|, i.e. summably by the temperedness condition. It is 
also clear that (K° a)(q1,...,Qn) is translation invariant. 

Hence the limits as V — oo of the correlation functions do exist and they 
can be computed by a convergent power series in z, and the correlation func- 
tions will be translation invariant in the thermodynamic limit and the lack 
of translation symmetry, due to the confinement in the box V, disappears 
when the box recedes to oo. 

In particular the one-point correlation function p = p(q) is simply p = 
z (1 + O(zr()?)), which to lowest order in z just shows that the activity 
can be identified with the density. Activity and density essentially coincide 
when they are small. 

Furthermore Bpy = + log Ev (G,) has the property that (z0.6pv)g = 
+ J pv(q)dq, as is immediately checked (by using the definition of py in 
(5.9.1)). Therefore the above remarks imply: 


Z dz! 


Bpv (8,2) = Jim qp 1082(8,a) = f Eole, (5.9.17) 


hence the grand canonical pressure p(8, z) is analytic in 8,z. The density 
p is analytic in z as well and p ~ z for z small. It follows that the pressure 
is analytic in the density and 6p = p(1+O(p?)), at small density. In other 
words the equation of state is, to lowest order, essentially the equation of a 
perfect gas, and all the quantities that we may want to study are analytic 
functions of temperature and density. 

The system is essentially a free gas and it has no phase transitions in the 
sense of a discontinuity or a singularity in the dependence of a thermody- 
namic function in terms of others. 

However the system also cannot show phase transitions in the sense of 
sensitive dependence on the boundary conditions: this is essentially clear 
from the above analysis (i.e. from the remarked short range nature of the 
kernel K) which shows that the dependence on the boundary condition 
disappears as the boundary recedes to infinity while translation invariance 
is recovered. 

One could, nevertheless, think that by taking other boundary conditions 
the argument may fail. It can however be shown that this is not the case, 
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simply by pushing the above analysis a little further. The key remark is in 
fact that any infinite volume state, obtained by any sequence of boundary 
conditions with fixed external particles, will obey the DLR equations, see 

85.6, and the latter can be shown to imply the “infinite volume Kirkwood- 
Salsburg equations, [La70]. The latter are simply (5.9.13) with V replaced 
by R, and make sense also for infinite volume as soon as the correlation 
functions satisfy a bound like p(q1,...,Qn) < E” for some € no matter how 
large. 

The limits of finite volume states with fixed external particle bound- 
ary conditions do satisfy a bound of this type and also the DLR equa- 
tions and, therefore, the Kirkwood-Salsburg equations in infinite volume. 
The uniqueness of the solutions of such equations proves, in the region 
|z| e?°8+17(3)3 < 1, the boundary condition independence (hence the trans- 
lation invariance) of the Gibbs states, [Do68c],[LR69]. 

Finally one can also see that the state of the system can be regarded as 
describing a distribution of particles in which particles occupying regions 
that are far apart are “independently distributed”. There are several ways 
to express this property. The simplest is to say that the correlations have a 
cluster property, see footnote 8, 85.5. This means that 


Jim plq, ne » An À + a, D sy Eu +a) = p(q, anche In) PGi: oe -> dw) (5.9.18) 


and this property is an immediate consequence of the above analysis in the 
small 6, small p regions. 

In fact, restricting ourselves for simplicity to the case in which the potential 
has finite range ro we easily check that 


2(z2K)Pa(qi,...,Qn, qi +4,---5 +a) = 


= Y (K) (qu... m)2(2K) (an, «dn (5.9.19) 
Pit+p2=p 


for all p and provided the distance between the cluster qg1,...,qn and the 
cluster gi + a,...,q!, + a is greater than pro. 

This is satisfied by induction and implies that the power series expansion 
for the difference between the expression under the limit sign in (5.9.18) 
and the right-hand side starts at p = O(a/ro) because all the coefficients 
with p < must vanish due to the fact that the kernel of the operator 
K vanishes when its arguments contain points that are too far away. In 
fact if the argument contains s + 1 points q1, y1,..-,ys and the maximum 
distance between them is greater than (s + 1)ro then at least one is further 
away than ro from qi, see (5.9.9). Then the above proved convergence 
shows that the limit is approached exponentially at a speed that is at least 
(zr(B}#e28B#1)a/r0 (this being the rate of approach to zero of the remainder 
of a geometric series with ratio zr(3)3e?°8+1 and starting at order |a|/ro). 

Hence if one wants to look for phase transitions one must forget the regions 
of low density and high temperature. The Kirkwood-Salsburg equations 
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are only one example of equations leading to convergent expansions for 
the correlation functions: there are many recent developments based on 
similar equations that are derived for other models or even for the same 
ones considered above. The most interesting concern lattice models. See 
85.10 below: see [Ca83], [KP86], [Br86] for some examples. 


85.10. Phase Transitions and Models 


As already mentioned the problem of showing the existence of phase tran- 
sitions in models of homogeneous gases, which we have been considering so 
far, is in fact still open. 

Therefore it makes sense to study the phase transitions problem in simpler 
models, tractable to some extent but nontrivial. In fact such an investigation 
can give a very detailed and deep understanding of the phase transition 
phenomenon. 

The simplest models are the so-called lattice models. They are models in 
which the particles are constrained to occupy points of a lattice in space. 
In such models particles cannot move in the ordinary sense of the word 
(because they are on a lattice and the motion would have to take place by 
jumps) and therefore their configurations do not contain momenta variables. 

The energy of interaction is just a potential energy and the ensembles 
are defined as probability distributions on the position coordinates of the 
particle configurations. Usually the potential is a pair potential decaying fast 
at oo and, often, with a hard core forbidding double or higher occupancy of 
the same lattice site. 

Often the models allow at most one particle to occupy each lattice site. For 
instance the nearest neighbor lattice gas, on a square lattice with mesh a > 0, 
is defined by the potential energy that is attributed to the configuration X 
of occupied sites: 


J if [x] = 
UE > PREU p= ‘6 ue (5:10) 
xz, yEX i 


One can define the canonical ensemble, with parameters 8, N, in a box A 
simply as the probability distribution of the subsets of A with N points: 


e-BH(X) 
~ Dies e OF) 


XICA 


p(X) IX| = N (5.10.2) 


where |X| is the number of points in the set X; and likewise the grand 
canonical ensemble with parameters 8, A in the box A by 


eBAIX| e-BH(X) 


PX) = BNR Te PRO : 


(5.10.3) 
Finally we can remark that a lattice gas in which in each site there can be 
at most one particle can be regarded as a model for the distribution of a 
family of spins on a lattice. 
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Such models are quite common and useful: for instance they arise in study- 
ing systems with magnetic properties. One simply identifies as “occupied” 
a site with a “spin up” or + and as an “empty” site a site with a “spin 
down” or — (of course one could make the opposite choice). If & = {c,}4 
is a spin configuration, the energy of the configuration will usually take the 
form 

H(o) = 5 (x — Y)0:0y + hX Ox (5.10.4) 
HA 


x, yEA 


and one calls canonical and grand canonical ensembles in the box A with 
respective parameters 3, M or 8, h the probability distributions on the spin 
configurations ga = {oz} with X peA 0x = M or without constraint on M, 
respectively, defined by 


BD], PE-Y) Fx Fy 
(a) te ee ee 
HAE 5 a dus p(a—y)olo!, 


eh Dor-B DY, PEW) FaFy 
PODDA 


(5.10.5) 


pp ala) = 


where the sums in the denominators run over the a’ with $, o = M in 
the first case and over all g”’s in the second case. 

As in the study of the previous continuous systems one can define the 
canonical and grand canonical ensembles with “external fixed particle con- 
figurations” and the corresponding ensembles with “external fixed spin con- 
figurations” . 

For each configuration X C A of a lattice gas we define {ng} to be nz = 1 
if x € X and ng = 0 if x g X. Then the transformation: 


Or = 2x — 1 (5.10.6) 


establishes a correspondence between lattice gas and spin distributions. In 
this correspondence lattice gases with canonical (or grand canonical) dis- 
tributions and given boundary conditions with external fixed particles are 
mapped into canonical (or grand canonical) spin distributions with suitably 
correspondent boundary conditions of external fixed spins. 

In the correspondence the potential y(x — y) of the lattice gas generates 
a potential +y(a — y) for the corresponding spin system. The chemical 
potential À for the lattice gas becomes the magnetic field h for the spin 
system with h = $(A+ Ð s70 9(2)): 


x#0 


plz) — p 


NI = 


The correspondence between boundary conditions is also easy: for instance a 
boundary condition for the lattice gas in which all external sites are occupied 


5.10.8 


5.10.9 


5.10.10 
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becomes a boundary condition in which all the external sites contain a spin 
+. The correspondence between lattice gas and spin systems is so complete 
that one often switches from one to the other with little discussion. 

The thermodynamic limits for the partition functions of lattice gas models, 
defined by 


1 
= p a —BH(X) 
B E(B, v) = im vn ee yo e (5.10.8) 
N =, IXI=N, XCA 
and 1 
= ee —BH(X)—BA|X| 
Bp(B,r) = Jim TN ee De (5.10.9) 


can be shown to exist, by an argument similar to that discussed in Chap.IV. 
(and by far easier). They have the same convexity and continuity properties 
of the corresponding quantities in the case of the continuous models and 
they will be given the same names (free energy and pressure). They are 
boundary condition independent, as was the case in the continuum models 
with hard core interactions. 

Likewise the thermodynamic limits exist also for the spin models partition 
functions and they are denoted by f, p: 


1 
ae | et —BH(a) 
B f(B,m) = r En + log ) e (5.10.10) 
FE D os=M 


and 
Bp(8.h) = lim Llog eto}, (5.10.11) 


however the physical interpretations of f,p are of course different. To find 
the meaning of the above quantities in the Thermodynamics of a spin system 
one would have to go through the discussion of the orthodicity again, in the 
case of such systems. One would find, as it is easy to check, that p(8, h) has 
the interpretation of magnetic free energy while f (8, m) is a quantity that 
does not have a special name in the Thermodynamics of magnetic systems. 

In the next chapter we shall consider some special cases: they are the 
simplest and they are quite remarkable as in some particular instances they 
are even amenable to more or less exact solution (i.e. calculation of the 
thermodynamic limit of various quantities, like for instance the free energy). 

The interest, as it will appear, of such models will be the wealth of infor- 
mation that they provide about the phenomena related to phase transitions. 


One of the developments of the late 1960s and early 1970s is that natural 
“extensions” to lattice spin systems of the formalism discussed in 85.3-85.7 
arise in rather unexpected contexts, see Chap.IX. 

Such “extended” lattice system models are spin systems more general than 
the model (5.10.4). For instance they allow the spins ©, at each site x to be 


5.10.12 
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5.10.14 
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an “arbitrary” finite set of symbols or “spin values” (rather than necessarily 
Ox = +1). The number of values of such “spins” divided by 2 is called the 
total spin: so that the case o = +1 is the “spin 3” case. 

It is convenient, for later reference purposes, to introduce here such ex- 
tensions: they will be one-dimensional models without phase transitions at 
least in the cases that we shall later consider in Chap.IX; they can be ex- 
tended also to higher dimension and, as such, they will appear in Chap. VII. 
The model energy H has the form 


H= 5, 5 Pari, Bae aes) (5.10.12) 


n=1%1<...<4n 


where ai mn(015-..: On) = Parta,.anta(O1,---;On) for alla € Z 
(“translation invariance”), and 


5 5 Wn max |Pxy,...,2,(01,---;On)| < OO (5.10.13) 
o1 On 


n=10=%1<...<an 2 


for some weights wn > 1. Ifw, = e“l?"-#1l one says that the model (5.10.12) 
is a short-range Ising model with many body interactions. The quantity: 


MOYEN, (D Grues (Oasa) (5.10.14) 


n=1 0=21<...<4n 


will be called the “energy per site”; a few properties of À should be noted. 
Namely À is “Holder continuous”: i.e. if o,o’ are two spin configurations 
agreeing for |i| < k (i.e. 0; = ø; for all ji] < k), and if the interaction has 
short range in the above sense, then for some & > 0, 


|A(z) — A(a’)| < conste "À. (5.10.15) 


This means that A(z) depends “exponentially little” on the spins located far 
from the origin. 
The partition function of the model with “open” boundary conditions will 
be simply 
Z=% le OHO), (5.10.16) 


More generally one can consider the model in the presence of “fixed spin 
boundary conditions”. This means that for each configuration g^ in the 
box A = [-L, L] we consider the biinfinite configuration a = (ol ,a",a") 
obtained by putting g^ on the lattice and then continuing it outside A 
with a” to the left and g? to the right, where g” = (...,0%,,04) and 
oF = (oh of ,...). The probability (a4) of a configuration in the model 


“with boundary conditions ¢,,¢p” will be: 


pa) = a (5.10.17) 
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where, if Ÿ denotes the shift operation on the bilateral sequences and we set 


ag^" =a, the energy H is 


L 
H(a’o*a®) = H(a) = X Ao) (5.10.18) 
k=-L 


so that one can remark that the exponential is simply written in terms of 
the energy per site A(z); (5.10.12) can also be expressed in a similar way. 
In fact let ¢ = o°“ be the biinfinite configuration obtained by extending g^ 
periodically outside the region A. Then the energy of interaction between 
the spins in A and between them and the ones outside A, is: 


L 
H(a*) = 5 (da) + corrections (5.10.19) 
k=-L 


where the “corrections” depend “only” on the spins near the boundary 
points +L and outside the interval A = [—L, L], in the sense that by varying 
the spin at a site at distance £ from the boundary the correction changes 
by a quantity proportional to e~* if x is the exponent in the weight wp 
introduced above, see (5.10.13). 

A further extension is obtained by considering a matrix T whose entries 
Too: are labeled by the spin values and are supposed to be To = 0,1, 
and by restricting the family of spin configurations a to the T-compatible 
configurations: they are defined to be those that satisfy T,,..,,, = 1, where 
T is a matrix with entries 0 or 1, called the compatibility matriz. One also 
calls such models “hard core spin systems” and T describes the hard core 
structure. Such systems are also called hard core lattice systems. 

If the matrix T is such that T? > 0, for all o,o’ and for n large enough, 
one says that the hard core is mixing. For such cases all the above for- 
mulae and definitions (5.10.12)-(5.10.19) extend unchanged provided only 
compatible spin configurations are considered. 


Appendix 5.A1: Absence of Phase Transition in non Nearest 
Neighbor One-Dimensional Systems 


The method discussed in §5.8 for hard core nearest neighbour one- 
dimensional models is called the transfer matric method. We extend it 
here to the more general case of finite range, but not nearest neighbor. The 
theory is very similar to that in §5.8. 

Let y(r) = 0 for r > (n — 1)ro for some integer n. Assume for simplicity 
that N is a multiple of n. Let q = (q1,...,@n) with qı < q2 <... < qn; and 
let us define K 


a(q) = alan) = 5 D ela - 49) (5.41.1) 


5.A1.2 


5.A1.3 


5.A1.4 


V. Phase Transitions 177 


lala’) = bas... amd... dh) = La (gi = d) 


c(g|g") = a(g) + b(g| g) + a(g'). 


Then the energy of a particle configuration is 


alqı, Ex sdn) T @(Gn((N/n)—1)+1> ae ,4N)+ 


Ne (5.1.2) 
+ 5 C(dnk+1 +++ dnk+n | An(k+1)+15--.5 An(k+2)) 
k=0 


so that one easily finds 
JN(B,p) = (a| T=! Ja) (5.A1.3) 


where T is the operator acting on the space of the functions of n coordinates 
q= (01,92, ---,qn) with qj+1 — qj > To: 


T f(a) = | e Pela | anta) e= (a +0) 8P f(g/)dg' (5.A1.4) 


and the vector |a) is the function e7 28 Pan— Bala) 

a the operator T is a Hilbert-Schmidt operator on the space Lo(dq) 
(i.e. | T(q,q')?dqdq’ < +00), and since its kernel is > 0 it “immediately” 
follows (i.e. it follows from well known results on the theory of operators, or 
better of matrices, like the Perron-Frobenius theorem, see p. 136 in [Ru69]) 
that the largest eigenvalue t(8, p) of T is isolated and simple and therefore 
it is analytic as a function of B, p, since T itself is analytic in such variables. 
Therefore \(3,p) = +t(G,p) is analytic in 3,p for 8,p > 0, and convex 
in such variables (see (5.8.2) showing that Jy is a “linear combination” of 
functions depending on B as e°°, hence log Jy(G, p), is convex in 3) and we 
can repeat the argument above to see that the equation of state gives p as 

an analytic function of B,v 

The further extension to systems with a potential with infinite range but 
satisfying (5.8.4) is also possible and it was the main purpose of van Hove’s 
theorem, [VH50]. The condition (5.8.4) comes in to insure that b(q |q’) 
is uniformly bounded: this quantity represents the interaction between a 
configurartion q situated to the left of another configuration q’, hence it is 
uniformly bounded if (5.8.4) holds. 
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86.1. The Ising Model. Inequivalence of Canonical and Grand 
Canonical Ensembles! 


The Ising model? plays a very special role in statistical mechanics and 
generates the simplest nontrivial example of a system undergoing phase 
transitions. 

Its analysis has provided us with deep insights into the general nature of 
phase transitions, which are certainly better understood nowadays, after the 
publication of the hundreds of papers which followed the pioneering work of 
Ising, Peierls, Onsager, Kaufman and Yang, [Pe36], [On44], [Ka49], [KO49], 
[Ya52]. 

The main reason why so much attention has been given to this very special 
model lies in its simplicity and, in spite of it, in the fact that it first gave firm 
and quantitative indications that a microscopic short-range interaction can 
produce phase transitions which, furthermore, deeply differ in character 
from the classical van der Waals’ (or Curie-Weiss’ or mean field) type of 
transitions, see 85.1 and 85.2. 

It should also be mentioned that the two-dimensional Ising model in zero 
external field is exactly solvable (see 87.4);% this fact has been very often 
used to check of the validity of numerical approximations designed for ap- 
plications to more complicated models, see the review [Fi64], pp. 677-702. 


Last but not least, we mention that the Ising model has given rise to 
a number of interesting developments and reinterpretations of old results 
in the theory of Markov chains, [Do68],[Sp71], information theory, [Ru69], 
[Or74], [RM75], random walks, [Gr67],[Fi67b],[GH64], [La85], to quote a few 
remarkable works, and therefore constitutes a notable example of a subject 
which has simultaneously been the object of advanced research in Physics, 
Mathematics and Mathematical Physics. 

In the rest of this chapter we give a description, certainly not exhaustive, 


1 This chapter is mostly taken from the paper Instabilities and phase transitions in the 


Ising model, La Rivista del Nuovo Cimento, 2, 133-169, 1972. 
For a history of the Ising model see [Br69]. 


The original solution for the free energy of the Ising model in two dimensions can be 
found in [On44]. It was preceded by the proofs of existence of Peierls, [Pe36], and van 
der Waerden, [VW41], and by the exact location of the critical temperature by Kramers 
and Wannier, [KW41]. 

The spontaneous magnetization was found by Onsager, [KO49], but the details were 
never published; it was subsequently rediscovered by Yang, [Ya52]. A modern derivation 
of the solution is found in the review article by Schultz, Mattis and Lieb, [SML64]: the 
latter is reproduced in 87.4. Another interesting older review article is the paper [NM53]. 
A combinatorial solution has been found by Kac and Ward and can be found in [LL67],p. 
538. Some aspects of this derivation were later clarified: and it has been discussed again 
in several papers, see [Be69]. Another approach to the solution (the Kasteleyn’s pfaffian 
method) can be found in [Ka61]. 
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of the model and of some selected results. They illustrate properties which 
throw some light on the general nature of the phenomenon of phase tran- 
sitions, mostly far from the critical point, and which, hopefully, should not 
be a peculiarity of the simplicity of the model. 

There exist some very good accounts of the theoretical arguments leading 
to the consideration of the Ising model in the context of physical problems, 
[Fi67a], [Ma65]; here we shall completely skip this aspect of the matter.* 


86.2. The Model. Grand Canonical and Canonical Ensembles. 
Their Inequivalence 


We consider a d-dimensional (d = 1,2,3) square lattice Z and a finite 
square A C Z? centered around the origin, containing |A| = LA lattice sites. 


On each site x € A is located a classical “spin” ox = +1. The “configu- 
rations” of our system will, therefore, consist of a set à = (0%,,..., Oxa) 
of |A| numbers oz = +1; the number of these configurations is 2!4!. The 


ensemble of the configurations will be denoted U/(A). 
To each spin configuration a certain energy is assigned, see §5.10: 


Hala) = —J N° 02,02, -hY os; + Bala) (6.2.1) 

<i,j> i 
where })-;,;5 means that the sum is over pairs (x:, xj) of neighboring 
points, h is an “external magnetic field’ and BA(c) describes the inter- 
action of the spins in the box A with the “rest of the world”. This could 
be the contribution to the energy that comes from the fixed spins boundary 
conditions that we considered in §5.5.5 

For simplicity we shall treat only the case J > 0. 

Of course Ba(a) in (6.2.1) can be rather arbitrary and, in fact, depends on 
the particular physical problem under investigation. It is subject, however, 
to one constraint of physical nature: in case we were interested in letting 
A — oo, we should impose the condition: 


lim — S = 9 (6.2.2) 


i.e. we want the condition that the energy due to B4 (@) should not be of the 
same order as the volume of the box; furthermore By, should depend mostly 
on the oz with x near the boundary; e.g. BA(a) = coo satisfies (6.2.2) but 
it should also be excluded.® In other words BA should be a “surface term”. 


4 In some cases the Ising model is a good phenomenological model for antiferromagnetic 
materials: this is the case of MnClz - AH20, see [FS62], [Fi67]. 

5 This term is usually omitted and in some sense its importance has only recently been 
recognized after the work of Dobrushin, Lanford and Ruelle, see [Do68], [LR69]. In this 
chapter the main purpose is to emphasize the role of this term in the theory of phase 
transitions. 


€ A precise condition could be that for any fixed set D, max |Ba(o) — Ba(a')| z= 
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The laws of statistical mechanics provide a relationship between the micro- 
scopic Hamiltonian (6.2.1) and the macroscopic quantities appearing in the 
thermodynamical theory of the system. The free energy per unit volume is 


given by 
-1 


fa(B,h) = TA 2 Z(B,h, A,B) (6.2.3) 
where 3 = T7! is the inverse temperature and 
Z(B,h, A, B)= So € PA (6.2.4) 
TEU(A) 


is the grand canonical partition function. Furthermore the probability of 
finding the system in a configuration ø of the grand canonical ensemble 
U(A) is given by the Boltzmann factor: 


eB Ha(g) 
The grandcanonical ensemble formalism based on (6.2.3),(6.2.5) corresponds 
to the physical situation in which there are no constraints on the system. If 
one could, by some experimental arrangement, regard for example the total 
magnetization M(a) = X sea Gx as fixed: M(o) = M = mlAl, then the 
expression (6.2.3) for the free energy would no longer be appropriate. 

One should rather consider the canonical ensemble, i.e. the set of the al- 
lowed configurations would be the set U(A, m) C U(A) consisting of all the 
a €U(A) such that X rer 0x = mA, (m| < 1), and the Thermodynamics 
would be described by the function 


= 
ga(B,h,m) = Fa 18 2 (8st A B,m) (6.2.6) 
where 
Z(B,h,A,B,m) = 5 e PHa(o) (6.2.7) 
oEU(A,m) 


and the free energy would be f\(8, h): 


fix(B,h) = hm(h) + ga(B,0,m(h)). (6.2.8) 
where m(h) is the solution of the equation:7 
j= __ 0gA(B; 0, m) (6.2.9) 
Om 


if the maximum is over all pairs of spin configurations g and g’ that differ only on D, 
i.e. such that ox = 0! for x ¢ D. 


Here we have not been precise about the problem of what 0/ôm means, since g4 (8, m) 

is only defined for certain rational values of m (whose number is finite). One could, 
for instance, extend gA(B,m) to all m’s by considering instead of gA(B,m) its convex 
envelope (or also one could prefer to consider the ga obtained by linear interpolation 
from (6.2.6)). This is not very satisfactory but it should not be very important for large 
systems, as discussed in Chap.IV. 
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There is no reason for having fa = fa since they correspond to different 
physical problems; it is only when, in some sense, the fluctuations become 
negligible (i.e. in the limit À — oo) that one can expect the identity between 
f and f. 

Of course in general the difference between fa and fa should vanish as 
|A|-! times O(|A|(¢-))/4) (this means O(log |A|) for d = 1); but, as we shall 
see on many occasions, the situation is not so simple for other quantities 
such as the correlation functions or the average magnetization. 

As discussed in §5.9 the inequivalence, for finite volume, of the predictions 
of the canonical and grand canonical ensembles should not be interpreted 
as meaning that statistical mechanics is only approximate when applied to 
finite systems; it simply means that in dealing with finite systems atten- 
tion must be paid to the boundary conditions as a manifestation of the 
peculiarities of the actual physical situation from which the problem under 
consideration arises. We conclude by remarking that in the canonical en- 
semble the probability of a spin configuration will be given by an expression 
similar to (6.2.5): 


e` bH (a) A 
> SS ; .2.1 
Z(B, h, A, B,m)’ a EU( m) (6 0) 


$6.3. Boundary Conditions. Equilibrium States 


Formulae (6.2.5), or (6.2.10), provide a complete statistical description of 
the properties of the system. An alternative and often more convenient, 
equally complete, description is provided by the so-called correlation func- 
tions: 


( ) pO nce ie (6.3.1) 

On To -- -Oraa B, Oooo 3. 
PA ee BHA(o) 

where >>, is extended to the appropriate statistical ensemble. For instance 

the average magnetization in the grand canonical ensemble U (A) is 


m= PIA h) = Pres re (6.3.2) 


We shall refer to the family of correlation functions (6.3.1) (regarded as a 
whole) as the equilibrium state of the system in the box A”. 

We call an equilibrium state, see §5.5, of the infinite system any family 
{(Ox -..0x,)} of functions such that, for a suitable choice of the Ba(a): 


(Ox, +++ Ox,) = lim (Fr +++ Orn) a By (6.3.3) 


A— 00 
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for all n > 1 and all 21, 2%2,...,% € Z%, simultaneously.® 

An equilibrium state for an infinite system will simply be called an equi- 
librium state: it is specified by a suitable choice of a sequence {Ba(c)} of 
boundary conditions satisfying the requirement (6.2.2). 

Let us list a number of remarkable boundary conditions: 


(1) Open boundary condition (also called “perfect-wall” boundary condi- 
tions). This name will be given to the case 


By(c) =0 for all g € U(A) (6.3.4) 


(2) Periodic boundary conditions. This corresponds to allowing spins on 
opposite faces of the box A to interact through a coupling —J (i.e. as the 
bulk spins). Clearly this can be obtained by a suitable choice of By(a); we 
shall refer to this choice as ” periodic boundary conditions”. 


(3) (£)-boundary conditions. Let (&,,...) be the 2dJA|(4-1/d lattice 
points adjacent to the boundary of A. Let € = (€¢,,€¢,.--), €e, = £1, be 
fixed. We shall call (£)-boundary condition the choice 


By(a)=—-J X. ot, (6.3.5) 
z;€O0A 


where (x;,£;) are nearest neighbors. 

The physical meaning of this boundary condition is clear: we imagine 
that the sites neighboring the boundary OA of A are occupied by a spin 
configuration € and that the latter spins interact with the spins ø through 
the same coupling constant of the bulk spins. 

The cases £ = (+1,+1,...,+1) or € = (—1,—-1,...,—1) will be, respec- 
tively, referred to as the (+)-boundary condition or the (—)-boundary con- 
dition. 


(4) In the two-dimensional case we shall be interested in another boundary 
condition. Suppose that the spins on the opposite vertical sides of A are 
allowed to interact through a coupling —J (i.e. we impose periodic boundary 
conditions along the rows of A only); and suppose that a set ¢,, of fixed spins 
is located on the lattice sites adjacent to the upper base of A and, similarly, 
a set €, of fixed spins is adjacent to the lower base of A. The spins €,,,€, are 
allowed to interact with the nearest spins in A with a coupling —J. We shall 


8 This definition is essentially in [LR69] where the equivalence of the above definition 
with a number of other possible definitions is shown. For instance the definition in 
question is equivalent to that based on the requirement that the correlation functions 
should be a solution of the equations for the correlation functions that can be derived 
for lattice gases or magnetic spin systems in analogy to those we discussed for the gases 
in §5.8. It is also equivalent to the other definitions of equilibrium state in terms of 
tangent planes (i.e. functional derivatives of a suitable functional: see [Ru69], p. 184, 
[Ga81]). 
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naturally refer to this choice of BA(o) as the (€,,,¢,)-cylindrical boundary 
condition. 


The particular cases 


Eu =(+1,+1,...,+1), El = (+1, +1,...,+1) 


6.3.6 
Eu =(+1,+1,..., +1), El = (—1,—1,...,—1) 


will be referred to, respectively, as (+,+)-cylindrical boundary condition or 
(+, —)-cylindrical boundary condition. 
86.4. The Ising Model in One and Two dimensions and zero field 


To acquire some familiarity with the model we examine some of the simplest 
cases. Consider the one-dimensional Ising chain with periodic boundary 


conditions. Labeling points of A as 1,2,...,L, the zero field Hamiltonian is 
L 
HA(o) = -JX OiOir1: OL4+1 = 01 (6.4.1) 
i=1 
(clearly BA(o) = — Joro). The grand canonical partition function can be 
written: 


L 
Za(B) = So ein = SOT eer, (6.4.2) 


a i=1 


Noting that (0;0;:1)? = 1 and therefore 
ef F%%%+1 = cosh BJ + o;0i+1 sinh BJ (6.4.3) 


(6.4.2) can be rewritten as 


L 


Za (B) = (cosh BJ)" X [[(1 + tanh BJ oisi). (6.4.4) 


a i=1 
If one develops the product in (6.4.4) one gets a sum of terms of the form 
(tanh BI) ou où 410% Oiz+1 +--+ Oi, Fi,41 - (6.4.5) 


It is clear that, unless k = 0 or k = L, each of the terms (6.4.5) contains 
at least an index i; which appears only once. Therefore, after performing 
the sum over the o’s, all terms (6.4.5) give a vanishing contribution to 
ZA(B) except the two with k = 0 and k = L which are, respectively, 1 and 
(tanh BJ)": o10202...01_10L0101 = (tanh 3J)". This implies 


Za(B) = (cosh GL)"2" (1 + (tanh BJ)*) (6.4.6) 
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Hence: 


BfA(B) = log(2 cosh BJ) + log(1 + (tanh @J)”). (6.4.7) 


It has to be remarked that 3f,(3), as well as Bf(8) = limr_ falb) = 
log 2 cosh BJ, are analytic in 8; this fact is usually referred to as the “absence 
of phase transitions” in the one-dimensional Ising model. The reader can 
check, by using the above method, that the partition function in the grand 
canonical ensemble and zero field but open boundary conditions (see §6.3) 
is slightly different from (6.4.6) and, precisely, is equal to (cosh G.J)"2". 

Consider now the two-dimensional Ising model in a zero field and with 
open boundary conditions: 


L L-1 L-1 L 
Hala) =-JS Y 01508541 JS ON oo. (6.4.8) 
i=1 j=1 i=1 j=1 


A better form for H4 (ø) is the following: 


Hyg=-JS & (6.4.9) 
b 
where $, denotes the sum over the bonds, i.e. over the segments b = 
(i,j), (i,j +1)] or b = [(i, j), (i + 1,7)], and 64 is the product of the two 
spins at the extremes of b (e.g. if b = [(i, j), (i + 1, j)] then 6, = Ci ;0%41,;). 
The partition function can be written, as in the one-dimensional case, as 


Z\(8) = (cosh 8J) FD N° TT (1 + (tanh 8J)or) (6.4.10) 
z b 
Developing the product we are led to a sum of terms of the type: 


(tanh BJ) ap, Gp, ... Fb, (6.4.11) 
and we can conveniently describe this term through the geometric set of lines 
bi, b2,..., bp. After the X`, is taken, many terms of the form (6.4.1) give a 
vanishing contribution. The ones that give a nonvanishing contribution are 
those in which the vertices of the geometric figure bı U b2 U...U by belong 
to an even number of b;’s (two or four). 

ee fi fo ah] 
Pt Ep tt 
|| | Ly 


® The solution can also be found for instance in [NM53]. 
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Fig. 6.4.1: The dashed line is the boundary of A. 


These terms are the ones such that dp, - dp,...0y, = 1. In Fig. 6.4.1a 
we give a typical nonvanishing term and in Fig. 6.4.1b an example of a 
vanishing term (k = 30). 

We shall, in the following, consider the geometric figures built with k seg- 
ments b1,...,0x such that dp, - op, ...0o, = 1 and call it a k-sided multi- 
polygon on the box A (needless to say, all the b1,...,b4 are pairwise differ- 
ent). Let P(A) be the number of such polygons. 

The partition function is now easily written as!0: 


Za(8) = (cosh 6J)? E-022 5 P,(A) (tanh BJ). (6.4.12) 
k>0 


86.5. Phase Transitions. Definitions 


We have already seen, in the preceding section, that the one dimensional 
Ising model has no phase transitions in zero field, since both f,() and f(B) 
are analytic in 6. 

We recall briefly in the concrete context of the Ising model the general 
considerations of Chap.V about the definition of phase transition as a phe- 
nomenon of macroscopic instability: slight changes of external conditions 
should imply dramatic changes of some macroscopic variables; it is hard to 
imagine how in such a situation thermodynamic functions, which we have 
seen to be boundary-condition independent, like the free energy, the pres- 
sure, etc, could be analytic functions of the parameters in terms of which 
they are expressed (say, temperature, chemical potential or magnetic field, 
etc). 

For this reason an analytic singularity in the thermodynamic functions is 
usually thought of as a “symptom” of a phase transition and on this idea it 
would be possible to base a definition and a theory of the phenomenon of 
phase transitions. 

Here, however, we will not base the investigation of the nature of phase 
transitions in the Ising model on the search for singularities of the thermo- 
dynamic functions; we shall rather adopt and make more precise the other, 
perhaps more immediate and intuitive, approach based on the detection of 
” macroscopic instabilities”, introduced in Chap.V. 

This way of proceeding is more convenient for the simple reason that a 
number of very clear and rather deep results have been obtained along 
these lines. But it should be understood that this second approach does 
not ” brilliantly” avoid the difficulties of the first. It is simply an approach 
to the theory of phase transitions which, so far, has asked for and provided 
a less refined description of the phenomena of interest, as compared to the 


10 The expansion can be used as a starting point for the combinatorial solution mentioned 
above, see [LL67]. 
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description which would be expected from the analysis of the singularities of 
appropriate analytic functions (an analysis still in a very primitive stage and 
whose problems are often not well formulated even in the simplest cases).11 
For this reason it provides a wealth of remarkable properties of the phase 
transition phenomenon. 

Let us now discuss in a more precise way the concept of macroscopic in- 
stability. Consider the Ising model and define the condition that a phase 
transition takes place at the values (3,h) of the thermodynamic parame- 
ters if the system is unstable with respect to boundary perturbations; i.e. if 
there are at least two sequences BA(a) and Bi (ca) of boundary terms (see 
(6.2.1), (6.2.2)) such that (say, in the grand canonical ensemble) 


dim (On, +. Frm) AB À in (On, ++ Can) AB (6.5.1) 
for a suitable choice of £1, £2,..., En, N. 


We first clarify why we say that, if (6.5.1) holds, we have a macroscopic in- 
stability. We remark that a change in boundary conditions does not change 
the extensive properties of the system such as the free energy. In fact, from 
definition (6.2.4): 


Z(G, h, A, Ba) IBa(a)|+1B4(2)| 
< emMaXceu(A) PA A\Z .5.2 
ZBR AB) = ° (a 
and therefore (6.2.2) implies 
: 1 mee 1 , 


On the other hand, if (6.5.1) is true, intensive quantities like the correlation 
functions are sensitive to the boundary conditions; for instance if 


Mo ha ee ee (6.5.4) 


we realize that the local magnetization changes as a consequence of a change 
in boundary condition even if the boundary is very remote. 

Of course once provided with a definition” of what a phase transition is, 
one has not gone very far. The real question is whether the definition reflects 
what is physically expected; this implies, in particular, that one should at 
least be able to prove the existence of a phase transition, in the above 


11 Of course we do not attach a deep physical meaning to the difference between these two 
approaches. Clearly they should be equivalent if one pretended to extract all possible 
information from them. What is really important is that the first questions raised by 
both approaches are very interesting and relevant from a physical point of view. One of 
the goals of the analytic theory of phase transitions is to understand the nature of the 
singularity at the critical point and at the “breaks” of the isotherms. A lot of interest 
has been devoted to this point and a number of enlightening phenomenological results 
are available. However the number of complete results on the matter is rather limited. 
An idea of the type of problems that are of interest can be obtained by reading the 
papers [Ka68] or the more detailed paper [Fi67]. 
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sense, in cases in which one expects a transition. Hopefully the definition 
and its physical interpretation should allow one to do more: for instance 
to provide the tools for a closer description of typical phenomena (like the 
phase separation). 

Here we end this somewhat philosophical but necessary discussion and 
in the coming sections we shall describe in the concrete example of the 
Ising model, some of the results that have been obtained since the early 
1960s, when the above point of view was starting to be developed, quite 
independently, by several people. 


86.6. Geometric Description of the Spin Configurations 


Here we introduce a new description of the spin configurations, which we 
shall use to derive in a very elegant way the exact value of the critical 
temperature in the two-dimensional Ising model. In the following sections 
the geometric representation, introduced below, will be widely used. 

Consider an Ising model with boundary conditions of the type (6.3.5) ((£)- 
boundary conditions) or with periodic boundary conditions (see §6.3). 

Given a configuration ø € U/(A) we draw a unit segment perpendicular to 
the center of each bond b having opposite spins at its extremes (in three di- 
mensions we draw a unit square surface element perpendicular to b). A two- 
dimensional example of this construction is provided by Fig. 6.6.1 (where 
a very special (£)-boundary condition is considered). 

The set of segments can be grouped into lines (or surfaces in three di- 
mensions) which separate regions where the spins are positive from regions 
where they are negative. 

It is clear that some of the lines (or surfaces, if d = 3) are ” closed polygons” 
(” closed polyhedra” , respectively) while others are not closed. It is perhaps 
worth stressing that our polygons are not really such in a geometrical sense, 
since they are not necessarily ” self-avoiding” (see Fig. 6.6.1): however they 
are such that they can intersect themselves only on vertices (and not on 
sides). From a geometrical point of view a family of disjoint polygons (in 
the above sense and in two dimensions) is the same thing as a multi-polygon 
in the sense discussed in §6.4, Fig. 6.4.1. 

In two dimensions instead of saying that a polygon is ”closed” we could 
equivalently say that its vertices belong to either two or four sides. 

We note that the (+)-boundary conditions, the (—)-boundary conditions 
and the periodic boundary conditions are such that the lines (surfaces) 
associated with spin configurations are all closed polygons (polyhedra). In 
the periodic case some polygons might wind up around the two holes of the 
torus. 

In the two-dimensional case and if the boundary conditions are the (+, +)- 
cylindrical or the (+, —)-cylindrical ones (see 86.3) a geometric construction 
of the above type can still be performed and, also in this case, the lines are 
closed polygons (some of which may ” wind around” the cylinder A). 
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Fig. 6.6.1: The dashed line is the boundary of A; the outer spins are those fixed by the 


boundary condition. The points A, B are points where an open line ends. 


For a fixed boundary condition let (41, y2,.--, Yh, A1,---; Ak ) be the disjoint 
components of the set of lines (surfaces) associated by the above construc- 
tion with a spin configuration g € U(A). The y1,...,%, are closed polygons 
and the À1,...,À% are not closed. The example in Fig. 6.6.1 has one no 
closed polygon only (due to the special nature of the boundary condition, 
“half + and half —”). 

Clearly the correspondence between (71, 72,---;Yh;A1,---;Ak) and o is, for 
a fixed boundary condition, one-to-one except for the case of the periodic 
or open boundary conditions, when it is one-to-two. Changing boundary 
conditions implies changing the set of lines (surfaces) which describe the 
same spin configuration ©. 

A very important property of the above geometric description is that, if 
Iyl, JA] denote the length (area) of the lines (surfaces) y and A, then the 
energy of a spin configuration is, in zero field, given by 


Ha(g) = —J- (number of bonds in A) + 2J[S~|yil + 3% Al. (6.6.1) 
i Jj 


This remark easily follows from the fact that each bond b contributing —J to 
the energy has equal spins at its extremes, while the bonds contributing + J 
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have opposite spins at their extremes and, therefore, are cut by a segment 
of unit length belonging to some y; or Aj. 

If Na= (number of bonds in A), the partition function becomes (in zero 
field and with fixed spin boundary conditions) 


Zn(8) = > > (ex, lil, 289 DU, Pan . eß INA (6.6.2) 


1: Yh Ai. Ak 


where the sum runs over the set of lines associated with a spin configuration 
oa €U(A) and with the boundary condition under consideration. 

In the case of periodic or open boundary conditions there may be no ds 
(this happens in the periodic case) and there is an extra factor 2 (because 


in this case the correspondence between g and (71,...,%n) is two-to-one); 
in the periodic case: 
Zn(B)=2 YO eH PFDs nl. PIN (6.6.3) 
Y1- Yh 
and Na = 2L?. 


Form the above considerations we draw two important consequences: 


(1) If the boundary condition is fixed, the probability of a spin configuration 
a described by 91, ..., Yh, A1,- --, Ak is proportional to: 


BIT, m+ E, Pal) (6.6.4) 


(II) In the case of (+) or (—) boundary conditions and two dimensions we 
remark that 57, in (6.6.2) is a sum over ”multi-polygons” lying on a 


shifted lattice and in a box A’ containing (L + 1)? spins (see the definition 
in §1.6) and, therefore, if 5°; [y| = k we have 


Za (b) — ePL(E+1)8J] 5 P, (A’) e728 Ik (6.6.5) 
k>0 


where P(A’) is the number of different multi-polygons with perimeter k 
(see (6.4.12)). 


If we now define 6* through 
tanh BJ = e~ 29°F (6.6.6) 


with A replaced by a volume A’ with side L — 1 then a comparison between 
(6.6.5) and (6.4.12) yields 


Zn(8) _ _4a'(6") 
227 (cosh 0J E-D E28 IL-1) (6.6.7) 
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Here Z,() is computed with open boundary conditions, while Za (8*) is 
computed with (+)-boundary conditions. 

If we assume that the bulk free energy f(B) = lima_ T log Za (8) has 
one and only one singularity as a function of 8, for 8 real, then (6.6.7) can 
be used to locate the singularity. In fact this implies 


f(8) — log 2(cosh BJ)? = f(8*) — 20*J (6.6.8) 


having used the fact that the free energy is boundary-condition independent, 
see (6.5.3). Hence a singularity in 3, if unique, can take place only when 
B = 6", i.e. for 8 = beo such that: 


tanh beo = e 78.04 (6.6.9) 


which, indeed, has been shown by Onsager, [On44], to be the exact value 
of the critical temperature defined as the value of 8 where f({) is singular 
(in the sense that its derivative diverges).12 

In the next section we outline the theory of phase transitions in the Ising 
model as a macroscopic instability and a spontaneous breakdown of up- 
down symmetry. We shall concentrate, for geometric reasons, on the two- 
dimensional Ising model but, unless explicitly stated, the results hold in all 
dimensions d > 2. 


12 This geometric picture of the spin configurations can be traced back at least as far 


as Peierls’ paper, [Pe36], and has been used, together with formula (6.4.12) to derive 
(6.6.8) (the “Kramers-Wannier duality” relation) and (6.6.9), [KW41]). A recent in- 
teresting generalization of the duality concept has been given in [We71], where some 
very interesting applications can be found as well as references to earlier works. The 
duality relation between (+) or (—) boundary conditions and open boundary conditions 
(which is used here) has been realized by several people. The reader can find other 
similar interesting relations in [BJS72] and further applications came in [BGJS73]. Du- 
ality has found many more applications, see for instance [GHM77] and, for a recent one, 
[BC94]. In particular a rigorous proof of the correctness of the Onsager-Yang value of 
the spontaneous magnetization is derived in [BGJS73]. 
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86.7. Phase Transitions. Existence 


In this section we shall show that the (+)-boundary conditions and the (—)- 
boundary conditions (see §6.3) produce, if the temperature is low enough, 
different equilibrium states (see 86.3), i.e. for large 8 the correlation func- 
tions are different and the difference does not vanish in the limit A — co 
(see (6.5.1)). 

More precisely we shall prove that if h = 0 and B is large enough then 


lim (02) ,.4 = +m"(8) £0 (6.7.1) 


where the index + refers to the boundary conditions. 

Clearly (6.7.1) shows that the magnetization is unstable (in zero field and at 
low temperature) with respect to boundary perturbations. We also remark 
that by using periodic boundary conditions one would obtain still another 
result: 


dim (G2), periodic =9, fA =O (6.7.2) 


because (Ox) A, periodic = 0, if h = 0, for obvious symmetry reasons. 

After a description of the very simple and instructive proof of (6.7.1) we 
shall go further and discuss more deeply the character of the phase transi- 
tion. 

As already remarked, spin configurations a € U/(A) are described in terms 
of closed polygons (71, V2,- --, Yn) if the boundary condition is (+) or (—) 
and the probability of a configuration ø described by (71, 2,---, Yn) is pro- 
portional to (see (6.6.4)): 

2B DT lal, (6.7.3) 


Below we identify o with (71,72,---,%) (with the boundary condition 
fixed). 

Let us estimate (oz), ,. Clearly (oz), , =1—2Pa,+(—), where Pa,+(—) 
is the probability that in the site x the spin is —1. 

We remark that if the site x is occupied by a negative spin then the point 
x is inside some contour y associated with the spin configuration o under 
consideration. Hence if p(y) is the probability that a given contour belongs 
to the set of contours describing a configuration g, we deduce 


Pa+(—) < S/o) (6.7.4) 


where yor means that y “surrounds” x. 

Let us now estimate p(y): if T = (1,.-.,%m) is a spin configuration and 
if the symbol I compy means that the contour y is “disjoint” from (or 
“compatible” with) y1,...,% (ie. {y UT} is a new spin configuration), 
then 


-28I J re |"! 


= e7 281 er 53, oI ter V1 
(6.7.5) 


1 
Z gree ie LA 5 e 
Ary e728 Jy T compy 


p(y) 
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Before continuing the analysis let us remark that if & = (7,71, Y2, <- -, Yn) 
then a’ = (41,72,---,%n) is obtained from ø by reversing the sign of the 
spins inside y; this can be used to build an intuitive picture of the second 
equation in (6.7.5). Clearly the last ratio in (6.7.5) does not exceed 1; hence: 


Re (6.7.6) 
Letting p = |y| and observing that there are at most 3? different shapes of y 
with perimeter p and at most p? congruent +’s containing (in their interior) 
x, we deduce from (6.7.4), (39.6): 


Par(—) < D pe 76? (6.7.7) 


p=4 


Hence if 8 — oo (i.e. the temperature T — 0) this probability can be made 
as small as we like and, therefore, (a1), , is as close to 1 as we like provided 
B is large enough. It is of fundamental importance that the closeness of 
(ox), to 1 is both x and A independent. 

A similar argument for the (—)-boundary condition, or the remark that 
(Oz), = —(0x)1,, allows us to conclude that, at large 8, (oz), _ # 
(ox), , and the difference between the two quantities is uniform in A. 
Hence we have completed the proof (” Peierls’ argument”) of the fact that 
there is a strong instability with respect to the boundary conditions of some 
correlation functions.13 

We can look upon the above phenomenon as a spontaneous breakdown of 
up-down symmetry: the Hamiltonian of the model is symmetric, in a zero 
field, with respect to spin reversal if one neglects the boundary terms; the 
phase transition manifests itself in the fact that there are equilibrium states 
in which the symmetry is violated “only on the boundary” and which are 
not symmetric even in the limit when the boundary recedes to infinity. 


86.8. Microscopic Description of the Pure Phases 


The description of the phase transition presented in 86.7 can be made more 
precise from the physical point of view as well as from the mathematical 
point of view. A deep and physically clear description of the phenomenon 
is provided by the theorem below, which also makes precise some ideas 
familiar from a model, which we shall not discuss here, but which plays 
an important role in the development of the theory of phase transitions: 
namely the droplet model, [Fi67c]. 

Assume that the boundary condition is the (+)-boundary condition and 
describe a spin configuration g € U(A) by means of the associated closed 
disjoint polygons (71,.--, Yn). 


18 The above proof is due to R.B. Griffiths and, independently, to R.L. Dobrushin and it 
is a mathematically rigorous version of [Pe36]. 
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We regard the ensemble U (A) as equipped with the probability distribution 
attributing to a = (71,---;Yn) a probability proportional to (6.7.3). 
Then the following theorem holds: 


Theorem. If (3 is large enough there exist C > 0 and p(y) > 0 with p(y) < 
e-?JI7| and such that a spin configuration à randomly chosen out of the 
ensemble U(A) will contain, with probability approaching 1 as A — oo, a 
number K,)(a) of contours congruent to y such that 


IKa — p(y) |All < CVA] e727 (6.8.1) 


and this relation holds simultaneously for all y’s. In three dimensions one 


would have |A|?/ instead of fA]. 


It is clear that the above theorem means that there are very few contours 
(and that the larger they are the smaller is, in absolute and relative value, 
their number). The inequality (6.8.1) also implies that for some C'() there 
are no contours with perimeter |y| > C(@)log|A| (with probability ap- 
proaching 1 as A — oo): this happens when p(y)|A| < 1 (because Koy) (a) 
is an integer and the right-hand side of (6.8.1) is < 1). Hence a typical 
spin configuration in the grand canonical ensemble with (+)-boundary con- 
ditions is such that the large majority of the spins is “positive” and, in the 
“sea” of positive spins, there are a few negative spins distributed in small 
and rare regions (in a number, however, still of order of |A|). 

Another nice result which follows from the results of §6.7, and from some 
improvement, [BS67], of them, concerns the behavior of the equation of 
state near the phase transition region at low (enough) temperatures. 
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If A is finite the graph of h — ma(G,h)f will have a rather different be- 
havior depending on the possible boundary conditions; e.g. if the boundary 
condition is (—) or (+) one gets respectively the results depicted in Fig. 
6.8.1 and Fig. 6.8.2, where m*(B) denotes the spontaneous magnetization 
lim _o+ lima_ Ma lB, Rh). 


mal, h) 


The thermodynamic limit m(B,h) = lim, mA(B,h) exists for all h # 0 
and the resulting graph is as shown in Fig. 6.8.4. 


At h = 0 the limit is not well defined and it depends on the boundary 
condition (as it must). It can be proven, if @ is large enough, that 


Om(G, h) 


6.8.2 im, oh F x(8) (6.8.2) 
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is a finite number (i.e. the angle between the vertical part of the graph and 
the rest is sharp, [BS67]). 

The above considerations and results also provide a clear idea of what a 
phase transition for a finite system means. 

It is often stated that a finite system “does not” show “sharp” phase tran- 
sitions; however this statement is always made when considering a fixed 
boundary condition, usually of periodic or perfect-wall type. By taking into 
account the importance of the boundary terms we see which kind of phe- 
nomena occur in a finite system, if the corresponding infinite system has a 
sharp phase transition. 

The next section is devoted to the discussion of a number of problems con- 
cerning the generality of the definition of a phase transition as an instability 
with respect to the boundary perturbations, and other related problems, in 
the special case of the Ising model that we are discussing. 


86.9. Results on Phase Transitions in a Wider Range of Temper- 
ature 


An unpleasant limitation of the results discussed above is the condition 
of low temperature (“8 large enough”). The results of the preceding sec- 
tions show that, at a low enough temperature, the Ising model is unstable 
with respect to changes in the boundary conditions. A natural question is 
whether one can go beyond the low-temperature region and fully describe 
the phenomena in the region where the instability takes place and first de- 
velops. In the particular case of two dimensions it would also be natural to 
ask whether the maximum value of 3 to which an instability is associated 
is the one given by (6.6.9) which corresponds to the value of 3 where the 
infinite volume free energy f(8) has a singularity, the critical point. 

The above types of questions are very difficult and are essentially related to 
the already mentioned theory of the phase transitions based on the search 
and study of analytic singularities of the thermodynamic functions (which 
is a theory, however, that has still to be really developed). 

Nevertheless a number of interesting partial results are known, which con- 
siderably improve the picture of the phenomenon of the phase transitions 
emerging from the previous sections. A list of such results follows: 


(1) It can be shown that the zeros of the polynomial in z = e°” given 
by the product of z!4! times the partition function (6.2.4) with periodic 
or perfect-wall boundary conditions lie on the unit circle: |z| = 1 (“Lee- 
Yang’s theorem” ). It is easy to deduce, with the aid of Vitali’s convergence 
theorem for equibounded analytic functions, that this implies that the only 
singularities of f(3,h) in the region 0 < 8 < œ, —co < h < +00 can be 
found at h = 0. 

A singularity appears if and only if the point z = 1 is an accumulation 
point of the limiting distribution (as A — oo) of the zeros on the unit circle. 
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In fact, if the zeros in question are z1,..., Zia; then 
1 2|Al 
iN log ZAIZ (2, h, A, periodic) = 28J + Bh + — ni 2 > log(z — zi) (6.9.1) 


and if |A|~! - que of zeros of the form zj = e’ with 0 < 0; < 0+ 


dô) r pA i in a suitable sense, we get from (6.9.1), 


T 


BECB, h) = 28J + Bh + =J log(z — e’”) pg()dd (6.9.2) 


= 


where the second term comes from the |z|!! appearing in (6.9.1). 
The existence of the measure pg(J)#% such that (6.9.2) is true fol- 


lows, after some thought, from the existence of the thermodynamic limit 


lim~—oo falb, h) _ f(8, h), az 


(2) It can be shown that the zeros of the partition function do not move 
too much under small perturbations of the spin-spin potential even if one 
allows “many spin” interactions; i.e. even if one perturbs the Hamiltonian 
(6.2.1) with perfect-wall boundary conditions into 


Ay (a) =Ha (e) + (ôHA)(c) 
(6H, )(a =, D a  — SU (6.9.3) 


k>1 £1, KEA 


where J’(X) is a function of the set X = (x£1,..., £p) such that 


IVI = sup XC (X (6.9.4) 


yEZd yEX 


is small enough. 

More precisely, suppose that one knows that, when J’ = 0, the zeros of 
the partition function in the variable z = e°” lie in a certain closed set N 
of the z-plane. Then if J’ Æ 0 they lie in a closed set N! contained in a 
neighborhood of N which can be made as small as we please when ||.J’|| — 0. 

This result, [Ru73b], allows us to make a connection between the analyt- 
icity properties and the boundary condition instability as described in (3) 
below. 


14 Here the symbol p4(9)dÿ/27 has not to be taken too seriously; it really denotes a 
measure on the circle and this measure is not necessarily dÿ-continuous. Also the “con- 
vergence” statement really means the existence of a measure such that (6.9.2) holds for 
all real z. The original proof of this theorem is in [LY52]. A much stronger and general 
statement, the Ruelle’s theorem, leading in particular to the Lee- Yang’s theorem is in 
[Ru7la]: it has been one of the most remarkable among a series of improvements and 
generalizations of Lee-Yang’s theorem (among which I quote [As70], [Ru71b]). 
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(3) There can be a boundary condition instability only in zero field and, in 
this case, if and only if the spectrum pg(W) does not vanish around V = 0: 
one says that there is a gap around 0 if pg(Ÿ) = 0 near 0 = 0. 

The proof of this result relies upon 2) and the remark that the correlation 
functions are functional derivatives with respect to J'(x1,...,xx) of the free 
energy defined by the Hamiltonian (6.9.3), [Ru73b]. 


(4) Another question is whether the boundary condition instability is al- 
ways revealed by the one-spin correlation function (as in §6.7) or whether 
it might be shown only by some correlation functions of higher order. This 
question is answered by the following result. 

There can be a boundary condition instability (at h = 0 and £ fixed) if 
and only if 


jlim_m(G,h) # lim m(8,h) (6.9.5) 


Note that, in view of what was said above (point 3)), m(6,h) = 
lim~—oo Ma(Z,h) is boundary condition independent as long as h Æ 0. 

In other words there is a boundary condition instability if and only if 
there is spontaneous magnetization. This rules out the possibility that the 
phase transition could manifest itself through an instability of some higher- 
order correlation function which, practically, might be unobservable from 
an experimental point of view [ML72]. 


(5) Point (4) implies that a natural definition of the critical temperature 
Te is to say that it is the least upper bound of the T’s such that (6.9.5) is 
true (T = 671). It is clear that, at this temperature, the gap around J = 0 
closes and the function f(3,h) has a singularity at h = 0 for 8 > be = Ty. 
It can in fact be proven that if (6.9.5) is true for a given So then it is true 
for all 8 > So, [Gr67], [Fi65]. 


(6) The location of the singularities of f(8,0) as a function of @ remains 
an open question for d = 3, see however [Gr67],[Fi65]. In particular the 
question of whether there is a singularity of f(3,0) at 8 = be is open. The 
identity fe = e,o for the two-dimensional Ising model has been proved in 
[BGJS73] and, independently, in [AM73]. 


(7) Finally another interesting question can be raised. For 3 < Be we have 
instability with respect to the boundary conditions (see (6) above): how 
strong is this instability? In other words, how many “pure” phases can 
exist? 


Our intuition, in the case of the Ising model, suggests that there should be 
only two phases: the positively magnetized and the negatively magnetized 
ones. 

To answer the above question in a precise way it is necessary to agree 
on what a pure phase is, [Ru69], p. 161. We shall call “pure phase” an 
equilibrium state (see footnote 8, §5.5 and (5.9.18)) if it is translationally 
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invariant and if the correlation functions have a cluster property of the form 


(On, -- -O£ Fy, 4a +++ Oymta) Gos? (Tn, +++ Tan) (Oy + + + Tym) (6.9.6) 


where convergence is understood in a very weak sense, i.e. the weakest 
sense which still allows us to deduce that the fluctuations of the extensive 
quantities are o(|A|), [Fi65], which is 

1 


iN (Orre an Tyas + Cyt) ro VO 2 Oga) Ogie Ou) 


(6.9.7) 
i.e. the convergence in (6.9.6) takes place in the “Cesaro’s limit” sense. 

It can be proved that, in the case of the Ising model, the two states obtained 
as limits for À — oo of finite volume states (see 86.3) corresponding to (+)- 
boundary conditions or (—)-boundary conditions are different for G > Ge 
and are pure phases in the sense of (6.9.7) above.15 

Actually it can be proved that, in this case, the limits (6.9.6) exist in the 
ordinary sense, [GMM72], rather than in the Cesaro sense, and that at low 
temperature they are approached exponentially fast, see [MS67]. 

Furthermore, if B is large enough (e.g. in two dimension ~ 10% larger than 
Be), these two pure phases exhaust the set of pure phases [GM72a], [Ma72]. 
For 8 close to 8e, however, the question is much more difficult: nevertheless 
it has been completely solved in a remarkable series of papers based on the 
key work [Ru79b]; see [Hi81], [Ai80]. The work [Ru79b] did provide a real 
breakthrough and a lot of new ideas for the theory of the Ising model and 
percolation theory, [Ru81], [Hi97]. The solution of this problem has led to 
the introduction of many new ideas and techniques in statistical mechanics 
and probability theory. 

Another approach, very rich in results, to the theory of correlation func- 
tions originates from the combination of the Griffiths, FKG and other in- 
equalities, see [Gr67], [FKG71], [Le74], with the infrared bounds introduced 
in the work of Mermin and Wagner, [MW66], see 85.8. I only quote here 
the work [Fr81], mentioned in §5.8 of Chap.V, where the reader can find 
a very interesting analysis of the behavior at the critical point of various 
correlations and a clear discussion of the relevance of the dimension of the 
lattice (if the dimension is > 5 the correlations are “trivial” ). 

Having discussed some exact results about the structure of the phase tran- 
sition and the nature of pure phases, we shall turn in the next section to 
the phenomenon of the coexistence of two pure phases. 


86.10. Separation and Coexistence of Pure Phases. Phenomeno- 
logical Considerations 


Our intuition about the phenomena connected with the classical phase 
transitions is usually based on the properties of the liquid-gas phase transi- 
tion; this transition is experimentally investigated in situations in which the 


15 This is an unpublished result of R.B. Griffiths. His proof is reported in [GMM72]. 
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total number of particles is fixed (canonical ensemble) and in the presence 
of an external field (gravity). 

The importance of such experimental conditions is obvious; the external 
field produces a nontranslationally invariant situation and the corresponding 
separation of the two phases. The fact that the number of particles is fixed 
determines, on the other hand, the fraction of volume occupied by each of 
the two phases. The phenomenon of phase transitions in the absence of an 
external field will be briefly discussed in §6.14. 

In the framework of the Ising model it will be convenient to discuss the 
phenomenon of phase coexistence in the analog of the canonical ensemble 
U(A,m), introduced and discussed in §6.2, where the total magnetization 
M = mļ|A| is held fixed. 

To put ourselves in the phase transition region we shall take 8 large enough 
and, for a fixed a, 0 < a < 1: 


m = a m*(8) + (1— a) (-m*(8)) = (1 — 2a) m* (8) (6.10.1) 


i.e. we put ourselves in the vertical “plateau” of the diagram (m, h)g (see 
Fig. 6.8.4 above). 

Fixing m as in (6.10.1) does not yet determine the separation of the phases 
in two different regions; to obtain this effect it will be necessary to intro- 
duce some external cause favoring the occupation of a part of the volume 
by a single phase. Such an asymmetry can be obtained in at least two 
ways: through a weak uniform external field (in complete analogy with the 
gravitational field in the liquid-vapor transition) or through an asymmetric 
field acting only on the boundary spins. This second way should have the 
same qualitative effect as the former, because in a phase transition region 
a boundary perturbation produces volume effects (this last phenomenon, 
which has been investigated in the previous sections, is often also referred 
to as the “long-range order” of the correlations). 

From a mathematical point of view it is simple to use a boundary asym- 
metry to produce phase separations. 

To obtain a further, but not really essential, simplification of the problem 
consider the two-dimensional Ising model with (+, —)-cylindrical or (+, +)- 
cylindrical boundary conditions. 

The spins adjacent to the bases of A act as symmetry-breaking external 
fields. The (+, +)-cylindrical boundary condition should, clearly, favor the 
formation inside A of the positively magnetized phase; therefore it will be 
natural to consider, in the canonical ensemble, this boundary condition only 
when the total magnetization is fixed to be +m*(() (see Fig. 6.8.4). 

On the other hand, the boundary condition (+,—) favors the separation 
of phases (positively magnetized phase near the top of A and negatively 
magnetized phase near the bottom). 

Therefore it will be natural to consider this boundary condition in the 
case of a canonical ensemble with magnetization m = (1 — 2a) m*(@) with 
0<a< 1, (6.10.1). 
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In this last case one expects, as already mentioned, the positive phase to 
adhere to the top of À and to extend, in some sense to be discussed, up to 
a distance O(L) from it; and then to change into the negatively magnetized 
pure phase. 

To make precise the above phenomenological description we shall describe 
the spin configurations © € U(A,m) through the associated sets of disjoint 
polygons (cf. 86.6). 

Fix the boundary conditions to be (+,+) or (+, —)-cylindrical boundary 
conditions and note that the polygons associated with a spin configuration 
a € U(A,m) are all closed and of two types: the ones of the first type, 
denoted 71,...,Yn, are polygons which do not encircle A, the second type 
of polygons, denoted by the symbols Aa, are the ones which wind up, at 
least once, around A. 

So a spin configuration ø will be described by a set of polygons (1, 

no... An). It is, perhaps, useful to remark once more that the 
configuration g will be described by different sets of polygons according 
to which boundary condition is used (among the ones we are considering, 
i.e. (+,+) or (+,—)-boundary conditions). However, for a fixed boundary 
condition, the correspondence between spin configuration and sets of disjoint 
closed contours is one-to-one and the statistical weight of a configuration 
a= (71, ---, Yn, A1,..., Àn) is (cf. (6.6.4)): 


oo PUD RAD DL") (6.10.2) 


It should also be remarked that the above notation is not coherent with 
the notation of 86.6, where the symbol À is used for open polygons (absent 
here); but this will not cause any confusion. The reason why we call À 
the contours that go around the cylinder A is that they “look like” open 
contours if one forgets that the opposite sides of A have to be identified. 

It is very important to remark that if we consider the (+,—)-boundary 
conditions then the number of polygons of A-type must be odd (hence Æ 0), 
while if we consider the (+,+)-boundary condition then the number of A- 
type polygons must be even (hence it could be 0). 


86.11. Separation and Coexistence of Phases. Results 


Bearing in mind the geometric description of the spin configuration in 
the canonical ensembles considered with the (+,+)-cylindrical or the 
(+, —)-cylindrical boundary conditions (which we shall denote briefly as 
UTT(A,m), Ut~(A,m)) we can formulate the following theorem, [GM72b], 
essentially developed by Minlos and Sinai to whom the very foundations of 
the microscopic theory of coexistence is due: 


Theorem. For 0 <a < 1 fixed let m = (1 — 2a) m* (8); then for B large 
enough a spin configuration o = (Y1, -, Yn, À1, -< -, A2n+1) randomly chosen 
out of U™ (A,m) enjoys the properties (1)-(4) below with a probability (in 
U*~(A,m)) approaching 1 as A — oo: 


6.11.1 


6.11.2 


6.11.3 


6.11.4 


6.11.5 
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(1) a contains only one contour of A-type and 
| |A] — (1 + €(8))L| < o(L) (6.11.1) 


where E(B) > 0 is a suitable (a-independent) function of B tending to zero 
exponentially fast as 3 — co. 


(2) If Ay, A denote the regions above and below À we have 
| |Aal — @ [Al | < x(8) |A? (6.11.2) 


[JAX] — (1 — a) [AI] < k(8) |A? (6.11.3) 


where K(B) Joe’ 0 exponentially fast. 
(3) If My = eens Ox, we have 
|My — a m* (8) AI < k(B)IAÏŸ/* (6.11.4) 
and a similar inequality holds for M; = en oz = MA) - M,. 


(4) If KÀ (a) denotes the number of contours congruent to a given y and 
lying in Ay then, simultaneously for all the shapes of y: 


IKÀ(@) — p(y) alAl| < Cet JA, = C > 0 (6.11.5) 


where p(y) < e~?8417| is the same quantity already mentioned in the text of 
the theorem of 86.8. A similar result holds for the contours below À (cf. the 
comments on (6.8.1)). 


It is clear that the above theorem not only provides a detailed and rather 
satisfactory description of the phenomenon of phase separation, but also 
furnishes a precise microscopic definition of the line of separation between 
the two phases, which should be naturally identified with the (random) line 
À. 

A very similar result holds in the ensemble U+* (A, m*(@)): in this case 1) 
is replaced by 


(1’) no A-type polygon is present 


while (2), (3) become superfluous and 4) is modified in the obvious way. In 
other words a typical configuration in the ensemble U** (A, m*()) has the 
same appearance as a typical configuration of the grand canonical ensemble 
U(A) with (+)-boundary condition (which is described by the theorem of 
86.8). 

We conclude this section with a remark about the condition that 0 < a < 1 
has to be fixed beforehand in formulating the above theorem. Actually the 
results of the theorem hold at fixed 3 (small enough) for all the a’s such 
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that e(8) < min(a, 1 — a), i.e. such that the line À cannot touch the bases 
of A (in which case there would be additional physical phenomena and 
correspondingly different results). 


86.12. Surface Tension in Two Dimensions. Alternative Descrip- 
tion of the Separation Phenomena 


A remarkable application of the above theorem is the possibility of giving 
a microscopic definition of surface tension between the two pure phases, 
[GM72b]. We have seen that the partition functions 


Z++(A, 8) = 5 eo BIOL: l+, aD) (6.12.1) 


gEeut+(A,m*(B)) 


and (if m = (1 — 2a)m* (6), 0 < a <1) 


Zt (A, 8) Z 5 oo BIOL: Mil+), A5) (6.12.2) 


geu+-(A,m) 


will essentially differ, at low temperature, only because of the line À (present 
in UT (A, m) and absent in U+t (A, m*(3)), see the preceding section). 

À natural definition (in two dimensions) of surface tension between the 
phases, based on obvious physical considerations, can therefore be given 
in terms of the different asymptotic behavior of Z*T(A,m*(8B)) (or of the 
grand canonical Z** (A, 3)) and Z+~(A,m): 


= 270 
BTE im, SZ mO) 


(6.12.3) 
The above limit (which should be a-independent for e(B) < min(a, 1—a), cf. 
the concluding remarks of the preceding section) can be exactly computed 
at low enough temperature and is given by 


8T(8) = —-28J — log tanh BJ (6.12.4) 


which is the value computed by Onsager, [On44], by using a different defini- 
tion, not based on the above detailed microscopic description of the phases 
and of the line of separation: for a comparison of various old definitions of 
surface tension, new ones and a proof of their equivalence see [AGM71]. 

We conclude this section with a brief discussion of one particular but very 
convenient alternative way of investigating the phenomenon of coexistence 
of two phases. Another still different way of investigating the phenomenon 
will be discussed in §6.14. 

Consider the grand canonical ensemble, but impose the following boundary 
conditions: the spins adjacent to the upper half of the boundary of A are 
fixed to be +1, while the ones adjacent to the lower half are —1 (and no 
periodicity condition). This is an s-type boundary condition (see §6.3 and 
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Fig. 6.6.1, or also cover figure) generating an ensemble that we shall denote 
by UJT (A). 

It is clear that a configuration & € UJT (A) is described, under the above 
boundary condition, by one single open polygon À (surface in three dimen- 
sions) going from one side of A to the opposite side, and by a set of disjoint 
closed polygons (polyhedra in three dimensions) (71,...,%n)- 

The surface À now plays the role of the polygons encircling A in the case 
of cylindrical boundary conditions (and two dimensions) and it is also clear 
that a theorem very similar to those already discussed should hold in this 
case. The above point of view is more relevant in the three-dimensional case 
where a “cylindrical” boundary condition would have a less clear physical 
meaning, and it would look rather a mathematical device. 

In the three-dimensional case À is a “surface” with a boundary formed by 
the square in the “middle” of OA where the “break” between the spins fixed 
to be +1 and the ones fixed to be —1 is located. 

In the next section we investigate in more detail the structure of such a 
line or surface of separation between the phases. 


86.13. The Structure of the Line of Separation. What a Straight 
Line Really is 


The theorem of 86.11 tells us that, if 8 is large enough, then the line À is 
almost straight (since e(8) is small). It is a natural question to ask whether 
the line À is straight in the following sense: suppose that À, regarded as 
a polygon belonging to a configuration g € U*~(A,m) (cf. 86.11), passes 
through a point q € A; then we shall say that À is “straight” or “rigid” if the 
(conditional) probability P) that À passes also through the site q’, opposite? 
to q on the cylinder A, does not tend to zero as A — co, otherwise we shall 
say that À is not rigid or fluctuates. Of course the above probabilities must 
be computed in the ensemble U*~ (A, m). 

Alternatively (and essentially equivalently) we can consider the ensemble 
Uj (A) (see §6.12, i.e. the grand canonical ensemble with the boundary 
condition with the boundary spins set to +1 in the upper half of OA, vertical 
sites included, and to —1 in the lower half). We say that À is rigid if the 
probability that À passes through the center of the box A (i.e. 0) does not 
tend to 0 as A — ov; otherwise it is not rigid. 

It is rather clear what the above notion of rigidity means: the “excess” 
length ¢(3)L, see (6.11.1), can be obtained in two ways: either the line À is 
essentially straight (in the geometric sense) with a few ” bumps” distributed 
with a density of order e(B) or, otherwise, the line À is bent and, therefore, 
only locally straight and part of the excess length is gained through the 
bending. 

In three dimensions a similar phenomenon is possible. As remarked at the 
end of the last section, in the ensemble UJT (A), in this case À becomes a 


3 ie. on the same horizontal line and L/2 sites apart. 
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surface with a square boundary fixed at a certain height (i.e. 0), and we ask 
whether the center of the square belongs to À with non vanishing probability 
in the limit A — oo. 

The rigidity or otherwise of À can, in principle, be investigated by optical 
means; one can have interference of coherent light scattered by surface el- 
ements of À separated by a macroscopic distance only if À is rigid in the 
above sense. 

It has been rigorously proved that, at least at low temperature, the line of 
separation À is not rigid in two dimensions (and the fluctuation of the middle 
point is of the order O(,/JA])); a very detailed description of the separation 
profile is available, at low temperature ({Ga72a],[GV72],[Ga72b]) and even 
all the way to the critical point [AR76]. In three dimensions the situation is 
very different: it has been shown that the surface À is rigid at low enough 
temperature, see [Do72], [VB75]. The latter reference provides a very nice 
and simple argument for the three-dimensional rigidity. 

An interesting question remains open in the three-dimensional case and 
is the following: it is conceivable that the surface, although rigid at low 
temperature, might become loose at a temperature T, smaller than the 
critical temperature T, (the latter being defined as the highest temperature 
below which there are at least two pure phases). The temperature T, c, if 
it exists, is called the “roughening transition” temperature, see [KM86], 
[VB77], [KM87], [VN87], . 

It would be interesting to examine the available experimental data on the 
structure of the surface of separation to set limits on T, =T. in the case of the 
liquid-gas phase transition where an analogous phenomenon can conceivably 
occur even though a theory of it is far from being in sight, at least if one 
requires a degree of rigor comparable to that in the treatment of the results 
so far given for the Ising model. 

We conclude by remarking that the rigidity of À is connected with the 
existence of translationally noninvariant equilibrium states (see §6.3). 

The discussed nonrigidity of À in two dimensions provides the intuitive 
reason for the absence of nontranslationally invariant states. 

Note that the existence of translationally noninvariant equilibrium states 
is not necessary for the description of coexistence phenomena. The theory 
of the two-dimensional Ising model developed in the preceding sections is a 
clear proof of this statement. 


86.14. Phase Separation Phenomena and Boundary Conditions. 
Further Results 


The phenomenon of phase separation described in §6.12 and §6.13 is the 
ferromagnetic analogue of the phase separation between a liquid and a vapor 
in the presence of the gravitational field. 

It is relevant to ask to what extent an external field (or some equivalent 
boundary condition) is really necessary; for instance one could imagine a 
situation in which two phases coexist in the absence of any external field. 
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Let us discuss first some phenomenological aspects of the liquid-gas phase 
separation in the absence of external fields. One imagines that, if the den- 
sity is fixed and corresponds to some value on the “plateau” of the phase 
diagram, see Fig. 5.1.1, then the space will be filled by vapor and drops of 
liquid in equilibrium. Note that the drops will move and, from time to time, 
collide; since the surface tension is negative the drops will tend to cluster 
together and, eventually, in an equilibrium situation there will be just one 
big drop (and the drop surface will be minimal). The location of the drop 
in the box À will depend on how the walls are made and how they interact 
with the particles within A. 

Let us consider some extreme cases: 


(1) the walls “repel” the drops, 
(2) the walls “attract” the drops, 


(3) the wall is perfect and does not distinguish between the vapor and the 
liquid. 


In the first case the drop will stay away from the boundary OA of A. In the 
second case the drop will spread on the walls, which will be wet as much 
as possible. In the third case it will not matter where the drop is; the drop 
will be located in a position that minimizes the “free” part of its boundary 
(i.e. the part of the boundary of the drop not on OA). This means that the 
drop will prefer to stay near a corner rather than wetting all the wall. 

Let us translate the above picture into the Ising model case. Assume 
that @ is large and m = (1 — 2a)m* (8) (see Fig. 6.8.4) (i.e. assume that 
the magnetization is on the vertical plateau of the (m, h)g diagram in Fig. 
6.8.4). 

Then the conditions (1), (2), (3) can be realized as follows: 


(1) The spins adjacent to the boundary are all fixed to be +1. This favors 
the adherence to the boundary of the positively magnetized phase. 


(2) The spins adjacent to the boundary are all fixed to be —1. This favors 
the adherence to the boundary of the negatively magnetized phase. 


(3) There are no spins adjacent to the boundary, i.e. we consider perfect 
wall or open (or free) boundary conditions (see §6.3). 


The rigorous results available in the case of the Ising model confirm the 
above phenomenological analysis of liquid-vapor coexistence [MS67]: 


Theorem. Fix 0 < a < 1 and consider (+)-boundary conditions. Then a 
spin configuration a randomly extracted from the canonical ensemble with 
magnetization m = (1 — 2a)m*(3) has, if B is large enough, properties 
(1)+(3) below with a probability tending to 1 as À > co. 


6.14.1 


6.14.2 


6.14.3 


VI. Coexistence of Phases 209 


a 


333 log |A| and it has the property'® 


(1) There is only one y such that |y| > 


[ni 4/0 = A| < 6(8) VIA (6.14.1) 


with 8(8) — 0 as B — co (exponentially fast); % (2) The area enclosed by 
y is Oy): 
la) — (= wA] | < k(8) A? (6.14.2) 


(3) The magnetization M(0(y)) inside y is on the average equal to —m*(() 
and, more precisely, 


| (9(y)) + m*(B) (1 — a) |A| | < (3) |A]? (6.14.3) 


and, therefore, the average magnetization outside (y) is +m* (8). 


This theorem also holds in three dimensions but, of course, the exponent 
of [A] in (6.14.1) changes (from + to 3). 

The above theorem shows that a typical configuration consists of a pos- 
itively magnetized pure phase adherent to the boundary and of a “drop” 
of negatively magnetized phase not adhering to the boundary (since y is 
closed). The size of the drop is ~ y (1 — a@)|A| (as it should be). 

Note that the drop is almost square in shape (as follows from (6.14.1), 
(6.14.2)): this should not be astonishing since the space is discrete and 
the isoperimetric problem on a square lattice has the square as a solution 
(rather than a circle). 

The opposite situation is found if one fixes a (—)-boundary condition; a 
square drop forms in the middle of the box with side ~ \/a|A| and average 
magnetization m* (8). 

Finally if the boundary condition is of perfect wall type (B4 (@) = 0), then 
the above theorem does not hold and one can prove (say, in two dimen- 
sions) that a typical spin configuration has just one open contour À (with 
ends on ðA) which separates the space in two parts which are occupied by 
opposite phases; the line À should be the shortest possible compatible with 
the condition that the volume A is divided by it into two regions of vol- 
ume essentially a|A| and (1 — a)|A| (respectively occupied by the positively 
magnetized phase and by the negatively magnetized phase): see [Ku83]. 

If one interprets the spins equal to +1 as particles and the spins equal to —1 
as empty sites, then one has a lattice gas model which undergoes a liquid- 
vapor phase transition presenting the phenomenological aspects outlined at 
the beginning of this section for these transitions. 


16 The number 333 is just an arbitrary constant and it is reported here because it appeared 


in the original literature, [MS67], as a joke referring to the contemporary papers on the 
KAM theorem (“Moser’s constant”). In fact it looks today somewhat confusing and 
quite strange: the modern generation do not seem to appreciate this kind of hum-our 
any more; they became more demanding and would rather ask here for the “best” 
constant; this is my case as well. 
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To conclude we remark that, in the phase separation phenomenon, the 
finiteness of the box only plays the role of fixing the density. The detailed 
structure of the phenomenon depends on the boundary conditions which, 
in experimental situations, turn out to be something intermediate between 
the three extreme cases discussed above. 

Note that (6.14.1) does not provide a satisfactory estimate of |] since the 
allowed error is still of the order of JA]; but better estimates can be ob- 
tained to determine exactly (i.e. with an error much smaller than O(,/JA))) 
the size of the boundary and its macroscopic shape. Such remarkable results 
also provide a rigorous microscopic theory of the ancient Wulff’s construc- 
tion of the shape of a droplet in the absence of gravity when the surface ten- 
sion is not spherically symmetric as a function of the normal to the droplet 
surface, see [DKS92], [PV96], [MR94], [Mi95]. The results hold in the two 
dimensional case and for low enough temperature: their literal extension to 
the whole coexistence region or to the three-dimensional case (even at very 
low temperature) seem out of reach, if at all possible, of present day tech- 
niques: in this respect, unexpected, remarkable progress has been achieved 
very recently, [PV99], with new techniques that cover, at least in 2 dimen- 
sions the whole phase-coexistence region (showing that despair is out of 
place). However one can get surprisingly detailed informations by general 
considerations based on inequalities and convexity properties of the surface 
tension, see [MMR92]. 

Another problem is the investigation of the dependence of the correlation 
functions on the distance from the surface of the drop. 

The analogs of the first two questions just raised were previously satis- 
factorily answered in the two-dimensional Ising model with the “easier” 
cylindrical boundary conditions (see 86.11), i.e. in the case of an “infinite” 
drop with a flat surface. This problem has been approximately studied even 
in the case of a flat drop, [BF67]. 


86.15. Further Results, Some Comments and Some Open Prob- 
lems 


In §6.14 we dealt with the case of a nearest neighbor Ising model. It 
has become customary, in the literature, to apply the name of Ising model 
to more general models in which the “bulk” Hamiltonian (i.e. without the 
boundary interactions and conditions) has the form, see §5.10, 


-hX oe, — 5 Joa 5 Og Fes = 5 J3 (Li, £j, El. de +- 
Ti 


i<j i<j<k 
(6.15.1) 
where the potentials Jn(£1,..., £n) are translationally invariant functions 
of (£1,..., £n) and satisfy certain restrictions of the type 
S > |J2(0, x)| +X |J3(0,2,y)| +... < +00. (6.15.2) 


x T,y 


6.15.3 
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If only pair potentials are present, i.e. if the bulk Hamiltonian has the form: 


RS Ox — X, J(£i Oe ee: (6.15.3) 


i<j 


and if J(r) < 0, then most of the results described in this chapter and ap- 
propriately reformulated can be extended or become very reasonable con- 
jectures, see [Ru69], p. 125, [MS67], [BS67], [Do68c] and the review [Gi70]. 

Many results remain true for more general pair potentials and for other 
models (like continuous gases) at least from the qualitative point of view; 
in fact it is reasonable that the results selected here for discussion should 
have, at least qualitatively, an analog in the “general” case of a classical (as 
opposed to quantum) phase transition. 

Results such as analyticity and absence of phase transitions at high tem- 
perature, or exact solutions, are a peculiarity of lattice models and have 
been partly discussed in Chap.V and they will be analyzed again later. 

Below I list a number of rather randomly chosen and interesting problems 
suggested by the topics of this chapter. 


(1) The solution of the two-dimensional Ising model is based on the so 

called “transfer matrix”. The investigation of the transfer matrix has been 
pursued in some detail in the case of periodic or open boundary conditions 
in two or three dimensions, [MS70], [CF71], see also [0n44],[Ab71]. 
The transfer matrix with non-symmetric boundary conditions has also been 
studied in the two-dimensional case,[Ab84], i.e. the transfer matrix between 
two rows (or planes) where the line (or surface) of separation should pass 
if straight. A qualitative difference should arise between two and three 
dimensions (see, for more details, §6.15). 


(2) In Fig. 6.8.4 we see that the isotherm m((,h) as a function of h > 0 
abruptly ends at h = 0. It is a natural question whether h = 0 is an analytic 
singularity of m(8, h) or whether m(B,h) can be analytically continued to 
h <0. There has always been strong evidence for a singularity, [LR69], and 
it has been shown, rigorously, that at h = 0 there is an essential singularity, 
at least at large 3, although the function m(@, h) is infinitely differentiable 
as a function of h for h > 0, [Is84]. 


(3) The answer to (2) makes clear that one has to give up the theory of 
“metastability’ based on the possibility of an analytic continuation of the 
magnetization as a function of h through h = 0. The latter idea was founded 
in fact on the absence of an analytic singularity at h = 0 in the equation 
of state deduced from mean field theory (that is in the van der Waals the- 
ory of phase transitions, whose version for spin systems is called the Curie 
Weiss’ theory): for an interesting mathematically complete treatment of the 
metastability phenomenon in the case of very weak and very long ranged 
forces see [LP79]. The question of how one can explain metastability phe- 
nomena in systems with short range forces has been investigated in great 
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detail as a dynamical phenomenon and the results are very many, detailed 
and varied, see for instance [CCO74], [KO93],[MOS90]. 


(4) There is a great number of other lattice models for which phase transi- 
tions are proven to take place; the basic techniques originated by the work 
of Minlos and Sinai, [MS67], have evolved and have been highly enriched; 
one can compare the situation in the early 1970s, before the first proof of 
the existence of a phase transition in a continuous system, still based on 
symmetry breaking [Ru73b], see the review [Gi70], with the present day 
very refined results (see [BLPO79], [BC94]. In this respect one must men- 
tion the very recent progress in nonlattice models, [Jo95] and [MLP98}). 
The latter works should be considered major breakthroughs on the problem 
of phase transitions occurring in continuous systems or, more generally, in 
systems without special symmetries which are spontaneously broken at the 
transition. 


(5) The question of whether, once a phase transition is known to occur, 
one can count how many pure phases exist is often a very intricate question 
as various examples show (see [BC94]). 


(6) A detailed description of the correlation functions near the line or 
surface of separation has still to be discussed, see [AR76]. 


(7) The microscopic definition of surface tension in the particular case of 
the three-dimensional Ising model has been studied but there are many open 
problems, see [Mi95], particularly concerning the cases when the boundary 
conditions would impose an ideal surface of separation, between the two 
phases, which is not parallel to the lattice planes. Furthermore it is a 
well founded conjecture that there is a temperature lower than the critical 
temperature for the appearance of spontaneous magnetization, above which 
the separation surface shows large fluctuations (possibly of order vlog L), 
see [VB77]. In this regime there would probably be no more translationally 
noninvariant states, and it is likely that the surface tension 7(B) is not 
analytic as a function of @ (while at low temperature it is known that 
the surface tension relative to an ideal surface of separation parallel to the 
lattice planes is such that 7(3) + 28J is analytic in e~°7). This would 
identify a second type of phase transition which has been called in §6.13 the 
roughening transition, see [KMS86], [KM87] and, for a review, [VN87]. 


(8) The problem of the existence of phase transitions in models close to 
symmetric models but asymmetric was expected to give interesting results, 
[Me71]. Substantial progress towards the understanding of phase transitions 
not directly associated with spontaneous breakdown of symmetry has been 
achieved by the understanding of the model in (6.15.1) with J3 4 0, [PS76]. 
Although the models in [PS76] are ”close” to symmetric models the absence 
of a rigorous symmetry was a major obstacle and the solution proposed 
in [PS76] has generated a large number of investigations: the theory is 
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generally known as the Pirogov-Sinai theory of phase coexistence, see for 
instance the applications to Potts’ models, [LMMRS91], [MMRS91]. 


(9) In connection with (8) an interesting problem arises on the correct 
definition of approximate symmetry. The analysis has been attempted in 
[EG75] but the results are still very partial. There is also the possibility 
that in some cases symmetries that are apparently already broken in the 
Hamiltonian are in fact dynamically restored (and then, possibly, sponta- 
neously broken). An example in which this happens has been proposed in 
[BG97] (see [BGG97] and [Ga98b] for some applications) in a totally dif- 
ferent context, but its relevance for the theory of phase transitions might 
create surprises. 


(10) Last but not least, the phase transitions problem in quantum statis- 
tical mechanics will not be discussed in this book, but it is, of course, very 
important. The conceptual frame in which it is developed in the literature 
is the same as the one we described in the classical cadre. However the phe- 
nomenology becomes even richer and full of surprises: see [BCS57],[Br65]. 
Phase transitions of quantum systems can be studied in “lattice systems” 
at a rather sophisticated mathematical level, [Gi69b], [DLS78], and we shall 
meet some in Chap.VII where they appear because of their relation with 
transitions in classical spin models. However the theory is, generally speak- 
ing, less developed on a mathematical level but it is enormously developed 
on a phenomenological level, see [An84]. 
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87.1. Transfer Matrix in the Ising Model: Results in d = 1,2 


Many well-known exact solutions of statistical mechanical models are based 
on the transfer matrix method. In fact the summation over the states of the 
system, e.g. the sum over the values of the spin at each site, can be quite 
easily interpreted as an operation of summing over the labels of a product 
of matrices in order to compute the trace of the product. Thus the problem 
of computing, say, a partition function is “reduced” to that of diagonalizing 
certain matrices with the purpose of computing their eigenvalues and eigen- 
vectors. The latter sometimes also provide informations on the correlation 
functions. 

The difficulty is that the matrices that one obtains are “large dimension 
matrices” and they are difficult to diagonalize. This can be done, at the 
cost of remarkable effort, in a few cases. We shall discuss a few samples of 
them. The simplest case is the d = 1 Ising model already studied in §6.4 
with a different method and at zero field h. 

Consider the one dimensional Ising model with periodic boundary condi- 
tions, see (6.2.1), Ch.IV. If 6141 = 01 the partition function Z(A, 8, h) can 
be written as 


L L 
X LES = X II e Zhai BJoioisi phoig _ 


Ci: CL Aas O1...OL i=1 (7.1.1) 
Ss Ve Voos a Voo = EVA 
oL 


where V is a two-by-two matrix such that (o, o’ = +1) 
B(h+J) —BI 
Sho ,BJoo' Sho’ e e 
Vo! = €? 7 ae e2 Z ; V = ( e767 e-8(h-J) J (7.1.2) 


If A+ > À_ are the two eigenvalues of V, we find 


Z(A, B, h) = d¥ +2 (7.1.3) 
so that i 
BECB, Rh) = jim z 0s 4 = logà+. (7.1.4) 


It is easy to check that 4+(6,h) is analytic in 8 and h for 0 < B < œ 
and —oo < h < ov, i.e. there are no phase transitions (as singularities of 
f(G,h)). In fact 


BFB, h) = log (es cosh Bh + (e?°/ (sinh Bh)? + e782?) (7.1.5) 
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as the elementary calculation of À} shows, from (7.1.2). This is manifestly 
an analytic function of 5, h in the region of physical interest (8, h real) so 
that the model has no phase transitions in the sense of no singularities of 
the thermodynamic functions or of their derivatives, as already discussed. 
A similar method can be applied to the two-dimensional Ising model (A is 
now an M x N box). Suppose, for simplicity, h = 0, then Z(G, h) is 


M N 
5 II II eb TTi Titii OIO jTi jt < 
g i=1 j=1 
7.1.6 M N (7.1.6) 
=) E S9 II { Il conscient fesait | 
Zi Zm il j=1 
where in the second line we denote by g; = (0:1,...,0i.n) all the spins on 
the i-th row of A; the periodic boundary conditions are imposed by setting 
Ti = Cm and oj,1 = o;,n 41. Clearly, if we define the 2N x 2N matrix 
N 
vss I] ee 050541 Jo; + Fotos, = 
j=l 
Tit N gJ j (7.1.7) 
=exp > (vis + BJao;o: D 7504) 
j=l 
where 01 = 0n+1, 01 = O41, we realize that 
M 
Ais Z(A, 3) = TV (7.1.8) 


We have dealt so far only with periodic boundary conditions. We could 
introduce transfer matrices also in the case of other boundary conditions. 
For instance, assume, for simplicity, that there are periodic boundary con- 
ditions along the columns; we shall consider the three cases below: 


(1) “Perfect wall” or “open” boundary conditions, see §6.3, along the rows; 


(2) Boundary conditions on the rows corresponding to the existence of fixed 
spins €; = +1 (or €; = —1) for all the i’s on the lattice sites adjacent to the 
end points of the rows; 


(3) Boundary conditions which are of the same type as in 2) but half the 
rows end in positive spins (say the upper half) and half in a negative spin. 


We shall now write down a transfer matrix expression for Z(A, 3) in the 
above cases. In case (1) Z(A, 8) = Tr V(M where: 


NT (Bd TjOj aa! Y Jojo’ 
7.1.9 ve, 2 get (ÉGiointoeiti}t) jay 8905 2A (7.1.9) 


7.1.10 


7.1.11 
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In case (2) Z(A, 8) = Tr V(=M where: 


Vie) atlist ye). (7.1.10) 


In case (3), assuming here that the height of A is M with M even, we have 
that Z(A, 6) = Tr(V))2@-1y G+) (OEM IVG) with 


Ue = non a, ; 


(7.1.11) 


The transfer matrix V in (7.1.7) is the matrix that was diagonalized in the 
famous paper of Onsager, [On44]. The matrix V“ has also been diagonal- 
ized exactly in [Ab71]. 

The matrices V(*) have been studied and diagonalized in [AM73]. Many 
more exact calculations of interesting quantities have been performed, see 
for instance[Ab71], [MW73], [Ab78a], [Ab78b], [Ab84]. 

The problem of computing the partition function can be formulated sim- 
ilarly in the three dimensional case. Some very interesting results on the 
spectral properties of the generalization to three dimension of the matrix V 
(periodic boundary conditions) have been obtained in [MS70],[CF71]. 

In three dimensions one expects that the analogue of V+) (in contrast 
to VO, V&)) has spectral properties which differ radically from those of 
V. In two dimensions the phenomenon should not occur and all the above 
matrices should have the same spectrum (asymptotically as A — oo). As 
mentioned in §6.15, problem (1), this should be related to the fact that 
V+) should contain some information about the rigidity of the line or 
surface of phase separation (which is “rigidly sitting” right near the two 
lines between which V+) “transfer”, see Ch.VI). 

A very interesting heuristic analysis of the spin correlation functions in 
terms of the transfer matrix has been done in [CF71]. 


— 


87.2. Meaning of Exact Solubility and the Two-Dimensional Ising 
Model 


Before proceeding to study more interesting cases it is necessary to say 
that usually by “exactly soluble” one means that the free energy or some 
other thermodynamic function can be computed in terms of one or more 
quadratures, i.e. in terms of a finite-dimensional integral, whose dimension 
is independent of the system size. 

In some cases one can even compute a few correlation functions: but there 
remain, as a rule, quite a few physically interesting quantities that one 
cannot compute (in the above sense of computing). 

Another characteristic problem of the “solutions” is that sometimes their 
evaluation (in terms of quadratures) involve a few exchanges of limits that 
are not always easy to justify. For instance the value of the spontaneous 
magnetization in the Ising model was derived in an unknown way by On- 
sager (who just wrote the final formula, (7.2.2) below, on a blackboard at a 


7.2.2 
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meeting in Firenze, [KO49]) and later by Yang, [Ya52], but both derivations 
relied on some hidden assumptions (in modern language the first assumed 
that there could be only two pure phases and the second that an external 
field of order O(+) would be strong enough to force the system into a pure 
magnetized phase). 

In the case of the Ising model spontaneous magnetization the mathematical 
problems were solved much later, independently in [BGJS73] and [AM73]. 
But in several other cases there are still open problems in establishing the 
validity of the “exact solutions” with full rigor (in the current sense of the 
word). 

The two-dimensional model is soluble only in zero external field (h = 0) in 
the sense that in this case the free energy f2(G), the magnetization m(h) 
and a few correlation functions can be expressed quite simply. For instance 
if BJ* = J*(@) is defined by tanh J* (8) = e~?97, see (6.6.6),(6.6.9), and 
denoting cosh~' as the inverse function to cosh: 


1 
Bf2(8,0) = = log 2sinh(23J)+ 
2 
ns (7.2.1) 
+ / =] cosh” ' (cosh 28J cosh 2J* + sinh 28J sinh 2J* cos y)|. 
TT 


T 


A simple analysis of the 8 dependence of this function shows that it is 
singular at the value 8 = 8e for which it is J = J* (i.e. sinh28J = 1) and 
the singularity appears as a logarithmic divergence of the derivative of fə 
with respect to 8, i.e. as a divergence of the specific heat. 

The exact value of the spontaneous magnetization, i.e. of the (right) h- 
derivatives at h = 0 of the free energy, can also be computed as said above 
and the result is 


0 , if sinh 28J < 1 

n—0+ ðh (Brh) Og) = { (1 — (sinh 26J)~*)* otherwise. 

(7.2.2) 
The importance of the above formulae, besides their obvious beauty, can 
hardly be overestimated. For instance they proved that in statistical me- 
chanics there are phase transitions with critical exponents different from 
those of mean field theory: e.g. from (7.2.2) one realizes that m(@) tore 0 
as (B — 3.)'/8 rather than the (8 — 6,)'/? foreseen by mean field theory, see 
Ch.V, 85.1 and 85.2). 
Many other quantities have been computed exactly: some already in the 
original Onsager papers and others in successive works. Among them we 
quote: 


(1) The correlation function (ooog) where O denotes the origin and x 
is a lattice point on one of the two lattice axes or, alternatively, it is a 
lattice point on the main diagonal; the symbol (-) denotes the average value, 
of the quantity inside the brackets with respect to the Gibbs equilibrium 
distribution. 
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One shows that, if «(G) = 26|J — J*| and if |x| is large, then the function 


(0002) a (000x) — (co) is proportional to 
e7*(8) 21 
o forp< Be 
Viz 
(000z) « jz|-/4 for = Be . (7.2.3) 
e~28(8)| 2 
5 for B > Be 
|x 


One should not view (7.2.3) as a discontinuity in the asymptotic behavior 
of the pair correlation function when the temperature passes through the 
critical temperature Te. A more detailed analysis shows in fact that if T is 
close to Te the correlation starts depending on x as if T = Te: this however 
proceeds only until |z| becomes so large to be comparable to the correlation 
length «()~+ which is longer and longer the closer T is to Tọ. Afterwards 
the exponential decay sets in (with a different power correction depending 
on whether 3 < Be or B > be). 

In fact one can compute the asymptotic behavior of all correlations func- 
tions (i.e. of the average value of products of spin values in an arbitrary 
number of sites) in various regimes: for instance for 8 4 Be when the spin 
sites separate from each other homothetically, [WMTB76], [MTW77]. 

In the same situation a beautiful asymptotic formula for the value of the 2n 
spins correlation function for 2n spins aligned along a line has been derived 
by Kadanoff, [Ka69]. 


(2) The surface tension between coexisting phases, defined as, see (6.12.3), 
Ch.VI, 


a ete Ma ABO) 0 if 8 < be 
7(8) = jim T 08 E D) = Lg if 8 > Be 


(7.2.4) 


where Z++, Z+- denote respectively the partition functions of the models 
obtained by fixing the boundary spins all equal to +1 in the first case and 
equal to +1 in the upper half and to —1 in the lower half in the second 
case. Here L is the perimeter of the container 2, which is assumed to be a 
square. See [On44], [GMM72], [@M72b], [AM73]. 


(3) Quite a lot is known about correlation functions of spins associated 
with boundary sites. In the case of a box with open boundary conditions 
(i.e. no interaction with the spins located at external sites) it is, for in- 
stance, remarkable that for 6 — Be the spontaneous magnetization on the 
boundary? does not tend to 0 as 6 — be with the same critical exponent + 
characteristic of the bulk magnetization (i.e. of the magnetization at a site 


1 Whose square is defined here as the limit as L — oo of the correlation (ozoy) with x, y 
being two points on ðA and at distance O(L). 
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at a finite distance from the origin, hence infinitely far from the boundary): 
the new critical exponent is in fact i (in the limit as Q — oo, of course), 
[Ab84]. 

Even more remarkable is that one can find the critical exponent of the 
magnetization computed at a corner, where it is 1; more generally one 
considers the Ising model in a wedge-shaped planar lattice with opening 
angle Ÿ (so that Ÿ = 7 is the case of a point in the middle of a side of the 
container, whose shape does not matter as long as the point is at a distance 
tending to infitity from the points where the boundary starts bending: “half- 


plane case”); and Ÿ = $ is the case of a “square corner”. The conjecture is 
that the critical exponent for the spontaneous magnetization at a “corner 
with opening J” is 35. This has been proved (together with various other 


results about the magnetization at points a finite distance away from a 
corner) for Ÿ = m (exponent 1, as mentioned above) and Ÿ = 4 to which 
these corresponds an exponent 4, see [AL95]. 


87.3. Vertex Models. 


Consider a rectangular region Q C Z? with opposite sites identified (peri- 
odic boundary conditions). We imagine that the microscopic states of the 
system are obtained by fixing an orientation on each lattice bond linking 
nearest neighbors of 2. 

Given a microscopic configuration g of the system, at every lattice site we 
shall see one among the 16 possibilities shown in Fig. 7.3.1 below, 


te 
td 
bé de 
Hé bebe 


The eight-vertex models or 8V-models are characterized by allowing only 
the configurations ø which in every lattice site the bonds orientations look 
as in A, B,C, D, see p. 128, p. 203 in [Ba82]. Furthermore the energy 
associated with a configuration is, in the general eight-vertex model, a sum 
of contributions £; coming from each lattice site j. Allowing only vertices 


Fig. 7.3.1 
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A, B,C generates the six-vertex models, or 6V -models. 

The vertices A, B are called polar vertices while the C, D are called non 
polar: the A vertices are usually labeled as 1 and 2, the B vertices are 
labeled 3 and 4, the C are 5 and 6, the D are 7 and 8. We denote by c 
one of the latter eight arrow configurations: an 8V configuration will be a 
collection of arrows put on the lattice bonds that at each lattice node form 
one among the eight arrows configurations, called “allowed vertices”. 

Each lattice site x gives to the total energy of the configuration an additive 
contribution e(c) depending on which of the eight possible vertices c are 
formed by the arrows entering or exiting the lattice site x. 

Not all the eight-vertex models are “exactly soluble”: one gets a soluble 
model if the energies of the vertices 1,2, i.e. À, are equal and so are those of 
the vertices 3,4, i.e. B, of 5,6, i.e. C, and of 7,8, i.e. D. Thus the family of 
eight vertex soluble models has three parameters (because one of the four 
can be eliminated by recalling that the total energy is defined up to an 
additive constant).? 


de P è 
We call €4 le = €2 the common value of the energies corresponding 
; x ; def def 
to the vertices 1,2, and likewise we set €g = €3 = €4, €C = €5 = £6 and 
def 
ED = E7 = Eg. 


The eight vertex models can be interpreted as spin models (hence as lattice 
gas models). A trivial way to do this is to interpret an arrow configuration 
as a spin configuration with the lattice of the spins being the lattice of the 
bonds, see Fig. 7.3.1; the up and down arrows can be identified with + and 
— spins located at the center of the arrows and, likewise, the right and left 
arrows can be identified with + and — spins. This naive procedure however 
relates the 8V models to spin models with constraints or hard cores because 
not all 16 configurations of spins on the arrows relative to a vertex are going 
to be possible. 

A much more interesting representation of the eight vertex models is ob- 
tained by considering Ising models in which interactions between next near- 
est neighbor spins and many spins interactions occur (see §5.10) between 
quadruples of spins involving the four spins of a unit lattice cell, see p.207 
in [Ba82]. An excellent introduction to the vertex models and their rela- 
tionship with other models can be found in [LW72],[Ka74]. 

We call the lattice of the centers of the vertices the “8V lattice” and we 
consider a configuration of arrows which at each point of the 8V lattice is 
one of the eight allowed configurations. We define a configuration of signs 
+ or — located at midpoints of the bonds of the 8V lattice: + represents a 
up or right arrow while — represents a down or left arrow. 

The product of the signs of the four bonds of the 8V lattice that merge into 
a vertex must be +: this is the condition that all vertices are of the above 


2 In fact the restriction e5 = £6 is not really such because the total energy depends only 
on €5 + €6. Furthermore if e7 = £g = 0 the models with £1 Æ €2 and €3 Æ €4 are also 
soluble so that the class of soluble 6V models has four parameters, one more than the 
8V model, see below. 
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first eight types. Therefore we have a one-to-one correspondence between 
the eight vertex configurations on the 8V lattice and the sign configurations 
at the centers of the bonds which multiply to + on each of the 8V lattice 
bonds that merge into a vertex. 

We imagine putting a spin o = +1 at the center of each square, also called 
a “plaquette”, formed by four of the 8V lattice bonds; the lattice of the 
centers of the 8V lattice plaquettes will be called the [sing lattice and it will 
carry in this way a spin configuration a. 

The product of the spins on nearest neighbor sites v, w of the Ising lattice is 
a sign +1 (we use this name rather than spin to stress the difference between 
these auxiliary variables and the spins introduced above). This sign can be 
naturally associated with the bond of the 8V lattice that separates v,w, 
and by construction the product of the signs on the sides that merge into a 
vertex is necessarily + so that the sign configuration can be interpreted as 
an 8V configuration and there is a one-to-two correspondence between 8V 
configurations and Ising spin configurations. In fact a spin configuration a 
and —ø give the same sign configuration, hence the same 8V configuration. 

Setting a = exp(—Ge,),b = exp(—Geg),c = exp(— bec), d = exp(—Sep) 
and defining J, J’, J”: 


a=exp#(J+J'+J")  b—=expB(—-J - J' +J") 

c=expü(-J+J — J") d=exp&(J — J'— J") 
it is now immediate to check that any configuration of the 8V model has 
the same energy as the corresponding spins configuration +ø in the Ising 
model with energy: 


H(o) = — sa Jo,o + >> J'oioir + 5 J" 040505103") (7.3.2) 


where the sum runs over the sites i € Q and 7’ denotes the nearest neighbor 
of à along the diagonal of the second and fourth quadrant, and i” the one 
along the first and third quadrant diagonal; j,7’,7” are three sites that, 
together with 7, form a unit square (with à in the lower left corner), see p. 
207 in [Ba82]. 

One can consider also the sixteen-vertex models, obtained by considering 
all arrow configurations in Fig. 7.3.1 above, including the E,F ones. This 
model is also equivalent to a suitable Ising model, with also three-spin in- 
teractions, see [LW72], p. 350, for the discussion of the general cases. 

This model has many interesting special cases, some of which were rec- 
ognized to be soluble before Baxter’s work. In fact the breakthrough in 
the whole theory was the solution by Lieb of the six-vertex Pauling’s “Ice 
model” discovering the method of solution for the other soluble 6V-models. 

Among the latter there are the six-vertex models, whose configurations 
only allow for the vertices A, B,C each of which gives a contribution to the 
energy €4,€B,€c. 


(7.3.1) 


(1) The just mentioned Pauling’s ice model fixes €4 = €g = Ec = 0, (it 
corresponds to J’ = Jo, J = —Jo, J” = Jo and Jo = +00). 
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(2) The KDP model fixes the B,C vertices energies to € > 0 and to 0 those 
of the vertices A: the occurrence of “non polar” vertices and of two of the 
four polar ones is energetically less favored. 


(3) The F model fixes the energies of A, B to € > 0 and those of C to 0: 
the non polar vertices are favored. 


Such models were solved (i.e. their free energy was calculated) in rapid 
succession after the solution of the first, [Li67a], [Li67b], [Li67c], [Su67]. 
The analyis and solution of the most general 6V model is in [LW72]. 

The 6V models above are limiting cases of the 8V model in which the 
couplings J, J’, J” in (7.3.2) tend (suitably) to oo. Because of this limiting 
procedure they have been sometimes regarded as “pathological”, see below. 
The 6V models have a physical meaning within the theory of the hydrogen 
bond and of similar chemical bonds. In the ice model the lattice sites 
represent the sites occupied by O, and the bond orientations tell where the 
two H atoms are located: if an arrow emerges from a lattice site this means 
that an H atom is located on the bond and near the site. The association 
of the arrows with the bond provides a two-dimensional version of the ice 
rule (which states that on any bond there is one H atom, and not more, 
located closer to one of the two oxygens), a rule deduced by Pauling from 
the observation that the ice entropy is lower than the one it would have if 
in an ice crystal the H atoms could be found, unconstrained, near every O 
atom (so that one could find configurations like Æ, F or even with opposite 
arrows on each bond). Of course the model should be, to be realistic, three 
dimensional, but the appropriate three-dimensional version is not exactly 
soluble. 

The KDP-model has been proposed as a model for the ferroelectric prop- 
erties of KH2PO4: a substance that crystallizes in tetrahedra with KPO, 
at the center and the two H atoms on the lines between the KPO4: only 
one H can be located on each such line and it can be there in two positions 
(i.e. near one or the other extreme). KH2POy, is a polar molecule without 
spherical symmetry so that not all dipoles give equal contribution to the 
total energy of a configuration. In the two-dimensional version of the model 
the two nonpolar vertices C and two of the polar ones (e.g. B) are unfa- 
vored and contribute energy € > 0 while the others contribute € = 0: at 
low temperature a spontaneous polarization, or ferroelectricity, is expected 
to occur. 

The F-model, instead, is a model for an antiferroelectric polar material 
resisting (at low temperature at least) polarization by a field. 

A deeper discussion of the physical interpretation of the 6V models and of 
their relation with other remarkable combinatorial problems and statistical 
mechanics models can be found in [LW72]. The vertex models are equivalent 
in various senses to several other models, see for instance, [VB77], [Ba82]. 

The problem of the existence of the thermodynamic limit for the 8V models 
is a special case of the general theory, see Ch.IV, with the minor modifica- 
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7.3.4 
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tion required to study lattice spin systems rather than continuous particle 
systems (see (7.3.2): an easier problem, in fact). However the situation 
is different in the case of the 6V models, because the constraints imposed 
by arrow configurations can propagate the boundary conditions on a large 
box all the way inside it. The theory has been satisfactorily developed in 
[LW72]. 

In the ice model case one finds, [Li67a] 

1 3 4 


log Z(Q) = 5 log 5 (7.3.3) 


ol 
which is really beautiful! 


In the F-model case the free energy is given by, if A = 1 — ere ; 


— Bfr(B) = —be+ 
+ [los et y if cosu = |A| < 1 (7.3.4) 
A ae NLL if coshÀ = —-A > 1 


and in the KDP-model case, setting À = seh, the free energy is: 


— Bfxop(B) = 
1 co  cosha—cos da : = y 7.3.5 
= { Bu J AEE cosh ra/2y if A = = cosu < 1 ( ) 
0 otherwise 


where the above solutions for the KDP-model is due to Lieb, [Li67b], and 
for the F-model to Lieb and Sutherland, [Li67c], [Su67]. See 87.5 below for 
a technical introduction. 

Furthermore the F-model and the KDP-model, unlike the soluble eight 
vertex models, can be solved even in the presence of an “electric field” E, if 
such field is modeled by assuming that the energy contribution of a vertex 
increases by — Ep with p being the number of arrows pointing up minus the 
number of those pointing down. The solution in the presence of such an 
electric field is fairly simple but we do not report it here. It is a model with 
one more free parameter in which the energies £1,€2 are different and the 
energies €3,¢€4 are also different by the same amount. 

The even more general model in which the electric field has also a horizontal 
component so that the energy of a vertex increases also by — E’q with q being 
the number of arrows pointing right minus the number of those pointing 
left is also soluble although the result is not as simple to discuss, [LW72]. 
Taking this into account amounts to a freedom of taking €;, j = 1,...,4 as 
independent parameters. Hence the number of free parameters in the most 
general 6V soluble model is 4 (they are £1, €2, €3, €4,€5 = €ç and one can be 
fixed to be 0). 

The elementary analysis of the above formulae, and of their extensions 
to cases with Æ 4 0, leads to the following results that we describe by 
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denoting by fr(G,£) and fxpp(G, E) the free energies of the two models 
at temperature (kg)! and in the presence of a (vertical) electric field E. 

The function fr(G,0) is infinitely differentiable and analytic in 8 except 
for an essential singularity, at the value G = Be where A = —1; one has, 
therefore, in zero field a phase transition of infinite order (the order of a 
transition as a function of a parameter can be defined to be the order of the 
lowest derivative of the free energy, with respect to the parameter, which is 
singular, see Ch.VI, §6.5). 

The polarization, defined as the average number of arrows pointing up 
minus those pointing down, is proportional to the derivative of fr with 
respect to E; it vanishes for Æ = 0 for all values of B; but if 8 > Be (low 
temperature) it remains 0 even if E 4 0, for a while, and it becomes nonzero 
only if E grows beyond a critical value E.(B) and in this sense the model 
has an antiferromagnetic behavior. If one keeps F fixed, then by varying 
b one finds a second-order phase transition with a specific heat singularity 
proportional to (8 — Be) 72. 

The free energy of the KDP-model, fxpp(G, F), is essentially different. 
Also in this case there is a critical temperature G = e at zero field E 
(defined by A = 1): in zero field and if 3 > 8e the polarization has value 1 
identically, and the free energy is constant; if 8 — 57 the specific heat tends 
to 0 as (8e — B)!/? but the internal energy does not tend to zero although 
the value of the internal energy for 8 > ( is 0, therefore there is a phase 
transition of first order with latent heat and at low temperature there is 
spontaneous polarization (maximal, p = 1, so that the system is “frozen” 
and it has trivial thermodynamic functions). 

The above properties, selected among many that can be derived by simply 
examining the expressions for the exact solutions, show the richness of the 
phenomenology and their interest for the theory of phase transitions, in 
particular as examples of phase transitions with properties deeply different 
from those found in the Ising model case, [LW72]. 

It is important to stress that the 6V models form a four-parameters class 
of models and some of the soluble 6V models are limiting cases of the three- 
parameter family of soluble 8V models. By varying the parameters one can 
find a continuous path linking the F-model critical point (in zero field) to 
the KDP-model critical point (also in zero field). 

As remarked above the 8V models are genuine short-range Ising models, 
(7.3.2), with finite couplings. Hence one can study how the critical point sin- 
gularity changes in passing from the F to the KDP-model. The remarkable 
result found via the solution of the 8V models is that the critical exponents 
(the ones that can be computed) change continuously from the F values to 
the KDP values. 

This fact has great importance: at the time of the solution of the ice model 
and of the consequent solution of the F and KDP models the universality 
theory of critical point singularity was not yet developed in its final form. 
So when the renormalization group approach arose around 1969, see [BG95] 
for a review and references, the 6V model appeared as a counterexample to 
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the universality that the renormalization group was supposed to predict. 

This was stressed by Lieb on several occasions: regrettably his comments 
were either not understood or they were dismissed on the grounds that the 
6V models regarded as Ising models were “models with constraints” , in the 
sense discussed above in the derivation of (7.3.2). 

Baxter’s solution of the 8V model made clear to everybody Lieb’s point of 
view. Even genuine finite-range “unconstrained” Ising models could show a 
critical singularity that was neither that of the F-model, neither that of the 
KDP-model nor that of the Ising model (to which the 8V models reduce if 
J” = 0). The three behaviors were the limiting cases of a three-parameter 
continuum of possibilities! so that the phenomena shown by the 6V models 
were not examples of pathological properties dubiously attributable to the 
hard core features of such models,? but they were the rule! This led to a 
much better understanding of the theories that were put forward to explain 
the universality phenomena, first among which the renormalization group 
itself. 

It would take too long to discuss the eight-vertex model properties, [Ba82]: 
it is not surprising that it offers a varied and interesting phenomenology, 
besides the enormous theoretical interest of the sophisticated analysis nec- 
essary to obtain the solutions. As mentioned it can be solved only in zero 
field (i.e. for €1 = €2, €3 = €4, E5 = E6, E7 = Eg). Some results in nonzero 
field can be obtained by “perturbing” the zero field models, see [MW86]. 

It is however important to stress once more that the exactly soluble models 
give very limited information about the actual thermodynamics of the sys- 
tem. For instance the informations that can be obtained about correlations, 
even just pair correlation functions, is very scanty. 

There are a few remarkable cases in which one can compute explicitly the 
pair correlation function, like the two-dimensional Ising model, see [MW73], 
[WMTB76], [MTW77], [AR76]), or even higher correlations, like the 2n- 
spins correlations in the two-dimensional Ising model when the spins are 
on the same lattice line and far apart, [Ka69]. Correlation inequalities can 
be very useful to study more general situations without having recourse to 
exact solutions, [MM77]. 

Often one is able to evaluate the correlation length because it can be re- 
lated to the second or third largest eigenvalue of the transfer matrix (often 
the highest is almost degenerate and what counts is the third) which can 
be studied quite explicitly (for instance in the 8V models), see [Ba82] p. 
241,284. 

Even the latter results provide little information about the “critical” cases 
when there is no gap isolating the top of the transfer matrix spectrum from 
the rest. The renormalizations group methods, which received so much 
clarification from the exact solution of the 6V and 8V models, do provide 
at least in some regions of the coupling parameters a, b, c, d (7.3.1), a rather 


3 Because as mentioned above their statistical mechanics properties are quite normal, as 
shown in [LW72], p. 354-361. 
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detailed analysis of the properties of the correlation decay. For a recent 
development see [Ma98]. 

There are many other exactly soluble models; e.g. one could just men- 
tion the spherical model, [BK52], the dimer model, [Ka61] and [TF61], 
the XY-model, the ground state of the one-dimensional Heisenberg model 
(whose solution can be related to that of the six and eight vertex models 
[Y Y66],[Ba82], see also [Li67a],[Su70],[Ka74]), besides the Ising model on 
some lattices other than the square lattice and the hard hexagon model on 
a triangular lattice, [Ba82]. The reader should consult the monographs on 
the subject [MW73], [Ba82] to which a good introduction is still the review 
paper [SML64] and the book [LM66]. 


87.4. A Nontrivial Example of Exact Solution: the Two-Dimen- 
sional Ising Model 


The actual computations for the exact solutions are always quite involved 
but their elegance surpasses Jacobi’s theory of the action angle variables 
for the pendulum. To give an idea of the procedures involved we describe 
below the classical solution of the Ising model, as presented in the paper 
[SML64].4 

One starts from the remark in 86.15 that the free energy of the Ising model 
in an N x M box with periodic boundary conditions is given by the trace 
(7.1.8) of the M-th power of the matrix V in (7.1.7). 

It is easy to construct a convenient representation for the matrix V. Con- 
sider the three Pauli matrices 


aal a e À a= (4 2) (7.4.1) 


and consider the tensor product H of N bidimensional linear spaces E and 
the operators o° defined by: 


N 
H=|[E=E2E8...80E 


j=1 
o =1@...@0°@1@...@lI, QA=2,Y,z 


(7.4.2) 


where J is the identity operator on F and o® is located at the jth-place. 

The operators oj are pairwise commuting and one can easily diagonalize 
them. If |o} € E is a vector such that o” lo) =o |o } (with o = £1), then 
the most general eigenvector of the operators af on H can be written as a 
tensor product of vectors: 


4 This was important progress as it provided a simple and easily understandable entirely 
new approach to the solution of the Ising model, at a time when rested on the original 
works [On44], [Ka49], [Ya52] which were still considered very hard to follow in the 1960s. 
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N 
lo) =|[° lo;), a= (01,..., 0N) (7.4.3) 
j=1 
and the 2% such vectors (each corresponding to a string g = (01,..., oN) 
of digits o; = +1) form an orthonormal basis in H. 


Consider the operator w on E such that 


(olwlo'} = e877 0,9 = +1 (7.4.4) 


which can be written as a matrix as 
BJ —BJ 
of ve e _ „BJ BJ x _ 4,J"o* 
= = I =A 7.4.5 
w er ed ) eI +e Pa e ( ) 


with A = (2sinh23.J)!/? and tanh J* = e7?P7. We see from (7.4.4) that 
the tensor product 


vi =][®w=we...@w= Ane” 25% (7.4.6) 
j=1 
has matrix elements 
N 
(al Vila) = J] es. (7.4.7) 
j=l 


Hence if we define the operator V2 on H, diagonal on the basis consisting 
in the vectors |ø}, which on |g} acts as 


l o°0° 
tr (7.4.8) 


then we immediately check that the matrix elements of the operator V = 
VoVi V2 between |o’) and (ø| are exactly the transfer matrix elements 
Vz in (7.1.7). 

This means that the problem of the evaluation of the partition function 
(7.1.8) is solved once we know the eigenvalues of V = V2V,Vo, or of any 
operator unitarily equivalent to it. 

We shall, therefore, perform the unitary transformation U on H such that: 


ærr—1l z Z2rr—1l x 
UojU = a; UojU = —0; (7.4.9) 


which transforms the matrix V into V = UVU7!: 


yeaa Co a 1 (e” > a) (e772 arma) (7.4.10) 


which we can write, defining the matrices Ÿ;, as VoV V2. 
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Here it will be very usefule to consider the following Pauli-Jordan trans- 
formation: 


OR SE z 
oF = zO + io), aj = 0; [[(-52 (7.4.11) 
s=l1 
so that o7 = a; [[..,(1—2a;a;) and ofa; =aj,a;. The usefulness of 
(7.4.11) is due to the remark that: 
ar, aÿ]+ = 0 — la; ,a;]+, laÿ,a;]+ = ĝjj (7.4.12) 


if []+ denotes the anticommutator. In other words the above transforma- 
tions changes the operators 0; which have mixed commutation relations 
(i.e. commuting for j 4 j’ while anticommuting for j = j’) into operators 
a; with purely fermionic commutations relations. We set N= >; a; az. 

This makes it easy to complete the calculations: one first remarks that the 


parity operator (—1)" a (1924 9 = (192 aa; commutes with the 
factors defining V, hence it commutes with V itself. 

Hence the space H is a direct sum of two orthogonal subspaces H® and H7 
on which the parity operator has values respectively +1 and —1, i.e. Ht 
contains an “even number of fermions” and H7 an “odd number”. 


This means that if |Q) is a vector in H such that a; |Q) = 0 for j = 


1,..., N, and one can see that there is always one and only one such vector, 
the HT is the space spanned by the vectors a ...aj,, |Q), n > 0 while H7 
is spanned by the vectors with a ...a7,,,,|Q), n> 0. 


Assuming from now on that 4N is even, for simplicity, one can make the 
following key remark; if we define the new operators: 


tin/4 N 


e7 pa T T T 
A = Setar, = g=4—,435,...4(N-1)— (7.4.13) 
q zi b 
VN 521 7 N N N 
where the q’s are defined so that eN = —1, then on Ht the following 
algebraic identities hold: 
Fin/4 
+ eœ Dre 
a; = ) FAG 
ie 
(7.4.14) 


4 ANT >), 24747 —1) 


= _ + = _ : 
=e? Dg ((At A7 +AT,AZ,) cos g+(AŸ At +AZ Ar sing) 


è 


where, in deriving the second and third relations, careful account has 
been taken of the facts that while ofo7,, = (oF + a7 lop + O54, = 
(aÿ = a; (at, + 541) for 7 < N it is, instead, (of + ayer +o) = 
—(—1) (a? —ay)(af + a7 ) and that on Ht it is (—1)N = +1. 


7.4.15 


7.4.16 


7.4.17 


7.4.18 
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The anticommutation relations (7.4.12) imply similar anticommutation re- 
lations for the A* so that the expressions in (7.4.14) containing different 
+q’s commute and therefore we can write, only on, HT, 


V=ANT[R (7.4.15) 


with A defined after (7.4.5) and 


=> Se Raa Hees 


Fe eb I((4Z A7 +A, AZ ) cosat(Ay At, Faq, Ay) sing), (7.4.16) 


«(At AW LAT 4- ry at AS + a+ A PEN 
e27" (A$ A7 +At, Aq -1) | BJ ((4; A, +At AZ,) cosq+(At At, +47, Ar ) sing) 


and the operators Py commute. Hence they can be considered as 4 x 4 
matrices acting on the space D, spanned by the vectors |Q), Af |Q), 
At, |), At At, |Q). 

Furthermore it is clear that the operator P, has A7 |Q} as eigenvectors 
and AX |Q) with eigenvalues e?°7°S4, Hence the x operators P} can 
be considered as operators on the space Df spanned by the two vectors 
I2) Arar |2}. Note that such space is invariant under the action of P}. 

Diagonalization of (7.4.15) will account only for 2% eigenvalues and eigen- 
vectors because the operator V coincides with the transfer matrix only oon 
H*. The remaining 229 are in the space H7 and they can be found in 
the same way: defining new operators A7 as in (7.4.13) with q such that 
etIN = 1 one obtains a representation like (7.4.15) which is now only correct 
if restricted to the space H7. 

Therefore the problem has been reduced to the diagonalization of x 2x2 
matrices obtained by restricting each P}, q > 0, to the space spanned by 
the two vectors Ay At, I2), |Q). The four matrix elements of P} on such 
vectors can be evaluated easlily starting from the remark that 


(7 


P} = e28J cos q oP I(T cos q+T sin q) 627 T eß ICO cos q+7” sin q) (7.4.17) 


where 


T” =(A} A7 + At AT — 1), T” = (AJ At, + AT À) 
1 (7.4.18) 
T” =(At At, — AT, AZ) 
1 


=o" 


and the matrices T”,7¥,7* have the same matrix elements as o”, o”, o” in 


(7.4.1), respectively, if the vectors AYA*, I2), |Q) are identified with & 


and (5). hence they satisfy the same commutation and multiplication 


relations and have the same spectrum. 
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By using the properties of the Pauli matrices it is possible, with some 
obvious algebra, to rewrite the product (7.4.17) as 


Py = e794 84 ( coshe(q) + (T7 cos Vq + T” sinVq)sinhe(q))  (7.4.19) 
where £(q) is the positive solution of 

cosh e(q) = (cosh 2J* cosh2GJ + sinh 2J* sinh 25J cos q) (7.4.20) 
and the angle Ÿ, is defined, setting t = tanh GJ and t* = tanh J*, by 


e2i9q — (Eteni) + tte") (7.4.21) 
(1+ e-4£)(1 + tt*e-“4) 


Coming back to P, one sees that on D7: 


P, = e284 cos q o£(9)(T7 cos Pq +7” sin Va) (7.4.22) 
but since the latter expression has A} |Q) and At, |Q} as eigenvectors 
with the “correct” eigenvalues e?9/°°S4 we see that P} is given by (7.4.22) 
on the entire four-dimensional space D,. Hence we must diagonalize: 


p jt cosa ge (AEA + AT —1) cosdu+(AÏ AZ, +42, A7) sin Ye) 
— 


(7.4.23) 
Note that if one sets (for yg + 204 = 0) 
At =B? cos y4 + BZ, sin 
pe S es (7.4.24) 
A, =— Bi sin y4 + BI, 008 Gq 
which is an example of a well-known transformation, called the “Bogoliu- 
bov-Valatin transformation”, and yg = —2v, one finds 
B, = e285 008 agelo) (By By + B*,B-4—1) (7.4.25) 


so that the transfer matrix on Ht can be written as AN exp(°, (BJ cos g + 


e(q)(BY By — 3)), see (7.4.15), noting that e(q) = e(—q) the 52% eigen- 


values of the transfer matrix on H* are, therefore, simply given by 


AN ota LEE eq = te(q) where the signs can be arbitrarily chosen as long 
as the is an even number of + signs (because the only relevant eigenvalues 
are those associated with an even number of fermions). The terms BJ cos q 
disappear because of the trigonometric identity 5> g c08q = 0. 

There is a unique vector |Q} such that By |Q} = 0 for all q > 0, and it 
is a linear combination of the vectors obtained by applying [],, AP. Aj, to 
|Q): this means that the “new vacuum” |’) is in the even space H™. 
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q>0 4<(q) and it corresponds to the 


eigenvector | [, B; |’) which, being in H*, really corresponds to an eigen- 
vector of the transfer matrix. 


Clearly the largest eigenvalue is AN 5° 


A similar calculation can be performed to find the eigenvalues of the trans- 
fer matrix on H7: this time one takes q such that eV = 1 and proceeds 
in the same way getting an analogous result. The analysis of the relation 
between the maximal eigenvalues in H+ and H7 is a simple matter: how- 
ever it is very illuminating as it reveals that if G < Be, with Be defined 
by 6.J = J*(6.), there is a gap between the two eigenvalues (which does 
not shrink to 0 as N — oo) while for 8 > ĝe (i.e. at low temperature) 
the difference between the two eigenvalues tends to 0 exponentially fast 
as N — oo. In the end this can be seen as the reason for the different 
asymptotic behavior of the correlations, in (7.2.3)). 

In the analysis on H one must pay attention to the fact that now the 
values q = 0,7 are permitted but they are not paired as the other values 
of q. This implies that the modes q = 0,7 must be treated differently, and 
the key difference that a careful discussion yields is that the value of e(r) 
is not necessarily > 0: it becomes negative for T < T, and this accounts for 
the difference in the spectrum of the transfer matrix above and below T,. 

With the above explicit expressions for the eigenvalues it is not difficult 
to see that in the limit as N — oo the only eigenvalues that count are 
the two with maximum modulus, which are almost degenerate if 6 > be 
and separated by a gap of order O(1) for 8 < Be. This means that the 
limit as N — oo of (7.1.8) is dominated by the largest eigenvalue À} and 
the free energy Bf(B) is always (for all 8) given by log A plus the limit of 
+ > _y>0€(g) which is the integral in (7.2.1) as a consequence of (7.4.20). 

The calculation of the spontaneous magnetization is more involved (see 
[Ya52], it is, however, quite simple in the method of [SML64]), and it requires 
extra assumptions, which can be removed by using further arguments as 
mentioned in Ch.VI, [BGJS72], [AM73]. 

We see that the above calculation requires some wit but its real difficulty is 
to realize that the transfer matrix can be written essentially as a quadratic 
form in certain fermionic operators. This reduces the problem to a 4 x 4 
matrix diagonalization problem; after that it is clear that the problem is 
“solved”, although some computations are still necessary to get a really 
explicit expression for the free energy. 

The above calculation should be performed in all its details by those inter- 
ested in statistical mechanics, as it is one of the high points of the theory, 
in spite of its apparent technicality. There are alternative ways to compute 
the free energy of the two-dimensional Ising model, but all of them have 
a key idea and a lot of obvious technicalities that accompany it. The one 
above is particularly interesting because of its connection with the quantum 
theory of fermionic systems (and because of its simplicity). 


87.5. The Six Vertex Model and Bethe’s Ansatz 
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We consider an N x M periodic lattice (N, M even) and on each bond we 
put an arrow so that the arrows entering or leaving each lattice node form 
a configuration c among the A,B,C in the Figure in 87.3. Let ¢(c) depend 
on c but suppose that it takes the same values in each pair in A or in B or 
in C; the value of e(c) will be the contribution to the energy of an arrow 
configuration. The partition function will be 


A= eO, Beas elo) (7.5.1) 
C 


c 


where the sum runs over all vertices, i.e. over all the arrow configurations 
compatible with the six-vertex constraint and which can be put on the NM 
vertices. 

Consider first the case e(c) = 0 (the ice model). Given a configuration C 
consider the M rows of vertical arrows. We associate a number omj = +1 
to the j-th bond (j = 1,..., N) on the m-th row (m = 1,..., M) indicating 
whether the arrow points up (+) or down (—). Let am = (Om1,---,;0mN)- 

Given g),...,@y,; ie. the collection of vertical arrow configurations there 
will be in general several horizontal arrows settings that will be compati- 
ble with o,,...,@ ,, ie. which together with the vertical arrows form an 
allowed configuration around each vertex. Of course the set of horizontal 
configurations that can be between two rows a, and g} depends solely 
upon ¢,.,0,4,. Hence we can define 


T(g, a’) = {number of horizontal configurations allowed 


(7.5.2) 
between two rows of vertical arrows o and g'}, 
and for instance in this case with e(c) = 0 it is: 
Z= Š, T(o1,02)-T(@2,03)---T (En 21)- (7.5.3) 


Therefore the free energy (which in the latter example is the so-called 
residual entropy, i.e. a number that measures how many configurations are 
possible for the system) is 


Po ne aay 


1 
log Tr TM = slim, 7 log Amax : (7.5.4) 


One checks, see Fig. 7.3.1, that 


T(u,9)=2, T(wsg)=0,1, fofo. (7.5.5) 
Fo=(o1,...,0n) and g’ = (0,,...,0%) and if 1 < £1 < %2 <... < En < 
N are the n-labels for which og, = —1 and 1 < 25 < 2, <... < £y SN 
the n’-labels for which o’, = —1, then T(g,a’) = 1 only when n = n’ and 


one of the following two chains of inequalities holds: 
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lee ok Due IN 
laxi <<... STE <En <N. 


(7.5.6) 


Hence we see that T is divided into blocks which do not mix spaces with 
different “total spin”, i.e. one must have X; o; = N—-2n=N-2n = $; o; 
to have T(a, a’) Æ 0. 

Fixed n (i.e. the total spin) we can consider the matrix T obtained by 
restricting T to the space generated by the unit vectors that we can label 
le) with ÿ',0; = N — 2n. This can therefore be regarded as an operator 


on the functions f(£1,...,£n) with 1 < z1 < T2... < £n S N: 
Ly T2 Tr 
CP Pesta >] yaw De Pine 
E g (7.5.7) 
+ 5 y» e 5 *f(y1,..., Un) 
Y1—T1 Y2=T2 Yn=Tn 


where the x denotes that the terms in which there are at least two equal y’s 
must be considered absent, see the following footnote 5. 
If a is the largest eigenvalue of T (n) then Amax = MAX1<n<N a. One 
N 
can, however, show that in the most interesting cases Amax = Aa, 


Bethe’s ansatz is that the largest eigenvector of the matrix T™ is a linear 
combination of plane waves: 


Fler, ..-y@n) = Y Ap es KR (7.5.8) 
P 
where the sum runs over the n! permutations P = (pj,...,pn) of the n 
indices and K1,..., Kn are n distinct “wave numbers”. 


The idea of trying to find eigenvectors of the form (7.5.8) appeared first in 
the work of Bethe, [Be31], who found that the eigenvectors of the matrix H 
defining the one-dimensional Heisenberg model could be expressed in that 
form.” 


5 The Heisenberg model on the lattice 1,2,..., N with periodic boundary conditions is 
an operator written in terms of the matrices defined in (7.4.2) and it is: 


N 
H = J TOG OF 44 + Jyo} oi + S205 05445 O1 = ON+1 (+) 
j=l 
which can be regarded as a matrix acting on the vectors in (7.4.3) which can be denoted 
d ú 
|) ee |v1,...,%n) if £1,...,£n are the lattice pints where Oz; = —1. Thus if the 


generic vector is written ae f(t1,...,%n)|t1,...,2%n) the operator H becomes 
1 


yin 
a matrix acting on the same space as the transfer matrix (7.5.7). If J = Jz = Jy a 
key remark is that given T the matrix H commutes with T if the coefficient J,/J = A 
is suitably chosen as a function of the parameters a, b,c of the six-vertex model: A = 


ere Then, see (7.3.4) and (7.3.5), A = 4 for the ice model, A = 1 — Zere in 
the case of the F-model and A = Zehe for the KDP model. 
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Bethe’s eigenvectors were not immediately useful, not even to compute the 
actual value of the lowest eigenvalue of H (ground state energy) because the 
coefficients Ap were difficult to treat and their evaluation required solving 
a linear integral equation which, at the time, could not be studied. 


Wave functions of the form (7.5.8) turned out to be very useful also in other 
problems: the “new era” started when it was shown, in [LL63], that the 
ground state energy of a simple one-dimensional Bose gas with a nontrivial 
interaction corresponded to an eigenvector of the form (7.5.8). 


In the papers [Y Y66] a complete study of the integral equation necessary for 
the evaluation of the ground state of the Heisenberg model was presented, 
and in [Li67a] it was discovered that one could find eigenvectors of (7.5.7) of 
the form (7.5.8), among which one with f(x1,...,21) > 0 corresponding to 
n = N/2 when N is even, as usually assumed. From this Lieb was able to 
show, in the ice model case, that the largest eigenvalue of T was precisely 
the same as that which gave the ground state energy of a corresponding 
Heisenberg model (with suitably chosen couplings, see footnote ©) and whose 
form had become known by the [YY66] key work. 


The commutation property, cited in footnote °, between the ice model 
transfer matrix and the Heisenberg model Hamiltonian with A = 5 Was 
also a consequence of the results on the ice model. This becomes clear as 
soon as one realizes that the two matrices have the same eigenvectors (all 


of the Bethe ansatz form, (7.5.8)). 

The knowledge of the ground state eigenvalue for the ice model transfer 
matrix led to the explicit evaluation of the ice model residual entropy and, 
shortly afterwards, the works [Li67b],[Li67c], and [Su67] determined the 
largest eigenvalue of the matrix T in the F and KDP-models. 

It would be fairly easy to reproduce the work of [Be31] to see that the ma- 
trix H, written in the basis of the last footnote °, admits eigenvectors of the 
form (7.5.8). And remark that the Heisenberg model matrix H and trans- 
fer matrix T for the above considered 6V models commute (if, of course, A 
is suitably chosen) implies that T has eigenvectors of the form (7.5.8) and 
provides a way for finding the solution oof the model: this, however, was 
not the path followed in the discovery of the ice model solution. 

A direct check that one can adjust Ap, Kp in (7.5.8) to make them eigen- 
vectors of the matrix (7.5.7), [Li67a], is possible and very instructive, al- 
though it is surprisingly difficult to write in words. The procedure suggested 
in [Ba82] is perhaps the simplest. 

Having noted that T decomposes into blocks that do not mix vectors in the 
spaces generated by the basis elements |x1,..., £n} (i.e. the functions of 
£i <... < x}, vanishing unless m = n and x! = x;, when their value is 1, see 
footnote °) with different n’s one studies first the case n = 0 (trivial), then 
n = 1 (also very easy), then n = 2 which is easy but requires attention as the 
algebra is already quite involved. The calculation is strongly recommended, 
and one should first attempt it in the ice model case A = 4. 

For the ice model (of the F and KDP-models which are not harder) one 
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finds 
A2 1 — 2A 2 + 2122 sk 
—_ = - , 2j =e J 
Ao 1— 2A 2% + 2122 
N A2 N A21 


= — 22 = —. 
j Azn’ a Aı2 


(7.5.9) 


If one has really performed the calculation at least in the A = 5 case, with 
the necessary patience, then: “the solution of the eigenvalue problem for 
arbitrary n is a straightforward generalization of the n = 2 case”, p.138 of 
[Ba82]! If P = (p1,...,Pn) is the generic permutation of 1,...,n, then: 


= P def 
Api npn = (—1) l | Spi,p+j Spa = 1— 2Az4 + 224 
i<j 


N = (it [[ (7.5.10) 


. Sj 
tj I 


where the last relations must be interpreted as equations for z; hence, by 
(7.5.9), for the K;’s. The calculations also provide the value of the eigen- 
value À corresponding to a sequence Ky,..., Kn ofn values that are pairwise 
distinct and satisfy the last of (7.5.10). For instance in the ice model case: 


x=[]— A JI E (7.5.11) 


j=1 


It is remarkable that if |A| < 1 and p,q are real then 


Spa 1 — 2Aet? + etta) _ ba). (7.5.12) 
Sox 1 — 2Aeï1 + ep+a) 


where the O’s are real and are given by 


Asin (p — q) 


O(p, q) = 2 arctg ———— (7.5.13) 
cos $(p + q) — A cos 4 (p — q) 
which is real if p,q are real. The conditions on kK; become 
= 1 
nl; = NK; +5 O(K; K), ŅG=j- 50 +1) (7.5.14) 


l=1 


which has been discussed in [YY66] and shown to admit, for all |A| < 1, 
a unique real solution K,,...,K,. Therefore, modulo mathematical rigor 
problems, we expect that K,,...,K, become dense when N — © and 
n/N = ô stays constant; the number of K;’s in an interval dk should 
be described by a density ps(K) such that [7 ps(K)dK = 6 = 2. The 
distribution ps(K) will be nonzero inside an interval [-Q, Q] € [—76, 76] 
(because j in (7.5.14) varies between 1 and $(n + 1)). 


7.5.15 


7.5.16 


7.5.17 


7.5.18 


7.5.19 


7.5.20 
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The form of (7.5.14) means that the number of K;’s which are < K is such 
that 


K Q 
\ | ps(K')dK' = mnt) +NKEN | O(K, K')ps(K')dK'; (7.5.15) 
—Q -=Q 
hence differentiating with respect to K: 
Q 
2rp(K)=1 +] OK O(K, K'\ps(K') dK’ (7.5.16) 
—Q 


which is an equation whose solution for ô 4 4 is still an open problem. 
The problem for 6 = À was solved in 1938 (reference to Hulthén, [Hu38], 
in [Ba82]) for the case A = —1, and in [Wa59] for the cases A < —1. The 
|A| < 1 cases were completely solved by Yang and Yang, [YY66], for A < 1. 
The case A > 1 is trivial, as noted by Lieb, because one can see that the 
maximum eigenvalue is the one corresponding to n = 0 and it is a +b”, 
see [Ba82], (see (7.3.1) for the definition of a, b). 

The reason why the case n = 4N is so special and exactly computable 
lies in the existence of a change of variables transforming (7.5.16) into a 


TERES d 
convolution equation, in the cases —1 < — cos u “FA <1. The change of 
variables is K — a: 


at elt = e& 
OO ae (7.5.17) 
or 
dk _ sin u 


RE 7.5.18 
da  cosha—cosp ( ) 


which maps the interval [-Q, Q] into (—00, 00) and 27 pi (K}—R(a) with 
R(a) satisfying: 


sin u Lie sin 2u 
UT 9 SS — ke 7.5.19 
(a) cosh & — cosu 27 D cosh(a — 3) — cos 2u (B)dB ( ) 


which can be solved by Fourier transform, and it even leads to the simple 


solution i i 
R(x) 


-~ 2cosh pa 62:20) 
for the Fourier transform Ê of R!! 

The first instance in which a transformation of the type (7.5.19) is used 
to transform the integral equation (7.5.16) into a simple equation is in the 
remarkable brief paper [Wa59] which introduced the change of variables 
corresponding to (7.5.17) in the case A < —1. The latter paper is based 
upon another remarkable paper, [Or58], which studies the same equation in 
an approximate way. 

It is now a matter of simple algebra to obtain the formulae of §7.4, 
(7.3.3), (7.3.4),(7.3.5), see [LW72],|[Ba82]. 
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The 8V model is not soluble by Bethe’s ansatz; I cannot give here even a 
sketchy account of its solution. The reader should look at Baxter’s book, 
[Ba82], detailing what is one the main achievements of mathematical physics 
in the 1970s. The book also illustrates several other exactly soluble models. 
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88.1. Brownian Motion and Einstein’s Theory. 


Brownian motion was first observed by Brown (1828): he recognized that 
the motion of “molecules”, of size of ~ 1073 cm, of a pollen in a solution 
(“colloid”) was not due to internal causes, as it was believed at the time, 
but it had a mechanical (unknown) origin. It was in fact observable by 
looking at particles of comparable size and of any material, both organic 
and inorganic. 

Very soon the movements were attributed to collisions with the microscopic 
constituents of matter. Among the first to recognize this was Cantoni, 
[Ca67], see p.93 in [Pa82], in a remarkable paper he says 


“in fact, I think that the dancing movement of the extremely minute solid 
particles in a liquid, can be attributed to the different velocities that must 
be proper at a given temperature of both such solid particles and of the 
molecules of the liquid that hit them from every side. I do not know whether 
others did already attempt this way of explaining Brownian motions...” 


In this paper an impressive number of experiments, performed by Cantoni 
himself, are reported in which he finds evidence for the equipartition of en- 
ergy between the suspended particles and the solvent molecules to conclude: 


“In this way Brownian motion provides us with one of the most beautiful 
and direct experimental demonstrations of the fundamental principles of the 
mechanical theory of heat, making manifest the assiduous vibrational state 
that must exist both in liquids and solids even when one does not alter their 
temperature”. 


This work is most remarkable also in view of the fact that it is contemporary 
of the first papers of Boltzmann on the heat theorem and equipartition. 

Brownian motion attracted the interest of many leading scientists, among 
which was Poincaré. Brownian motion theory was worked out by Einstein 
and, independently, by Smoluchowski, (1905-1906), soon followed by the 
experimental confirmation of Perrin, (1908), see [Ei56],[VS06];[Pe70]. 

The main critique (Nageli, [VN79]) to the microscopic kinetic nature of 
Brownian motion was the remark that experimental data and kinetic theo- 
ries permitted one to estimate that (the particles in suspension being hun- 
dreds of millions times larger than the molecules of the liquid) the velocity 
variations at each collision had a random sign; so that it seemed inconceiv- 
able that one could see a nonvanishing average effect. A fallacious argument, 
as it was stressed (for instance) by Poincaré (1904), [Po00]. He also noted, 
with others, that the hypothesis that the colloidal particles motion had a 
kinetic nature could contradict thermodynamics (see below). 

The fallacy of the reasoning was, in any event, well known as it appears 
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from an esoteric article by Bachelier (1900), [Ba00], on the austere Annales 
de l’Ecole Normale of Paris (a few pages away from the French translation 
of Hilbert’s Grundlagen der Geometrie); it is a paper on stock market spec- 
ulations where, precisely, the problem is posed of how can many random 
small variations produce visible effects (see below). 

Einstein’s approach starts with the remark that the suspended particles, 
although of huge size, can still be considered as large molecules and, hence, 
one can apply statisical mechanics to them, so that they will exercise osmotic 
pressure, just as with ordinary solutions, satisfying (therefore) Raoult-van 
t‘Hoff’s law at least at small concentrations. Hence Van t‘Hoff’s law holds 
not only for solutions of microscopic particles but also for calculating the 
partial pressure due to particles of arbitrary sizes (e.g. glass balls). 

The idea was “revolutionary” and, as Einstein realized, possibly in contrast 
with classical thermodynamics, but not with statisical mechanics and the 
atomic hypothesis. Hence he immediately posed the question of how to find 
observable macroscopic consequences. 

Nonrectilinear motion of particles is thus attributed to their random colli- 
sions with molecules. Hence it is a random motion, at least when observed 
on time scales 7 large compared to the time necessary to dissipate the ve- 
locity v acquired in a single collision with a molecule (by friction, due, also 
to microscopic collisions between fluid molecules). The dissipation of such 
velocity can be estimated, in the case of macroscopic particles, by remarking 
that in a single collision with a molecule the acquired speed v is dissipated 
into heat by the action of a force F which, by Stokes’ law, is 


m— = F =—6rnRv (8.1.1) 


where 77 is the fluid viscosity coefficient, R is the radius of the suspended 
particles, and v the speed; hence the characteristic time scale for the loss 
of the velocity acquired in a single collision is to = (6mmm~!R)~!. This 
is a very short time (for instance if R = ly, and if m is evaluated by 
assuming that the density of the material constituting the large particles is 
the same as that of the liquid in which they are suspended (i.e. water, so 
that 7 = 1073 cgs-units), one realizes that the time scale is to & 1077 sec). 
Therefore on the time scale 7 >> to motion will be diffusive. In such motions 
there is transport of matter only when there is a density gradient. 

The logic of Einstein’s analysis is quite fascinating. Using an ideal experi- 
ment (a method characteristic of his thinking) he links microscopic quanti- 
ties to macroscopic ones. The background solvent fixes the temperature and 
the time scale over which a particle undergoes a diffusive motion: to find the 
diffusion coefficient for a single particle one considers a gas of particles of 
arbitrary density v, but so small that the ideal osmotic pressure law holds. 
We recall Raoult-van t’Hoff’s law: if p is the osmotic pressure (i.e. the par- 
tial pressure due to the particles) and v is their numerical density one has 
p = kpTv, with T being the fluid temperature and kg Boltzmann’s con- 
stant. This is done in spite of the fact that in the classical experiments the 
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suspended particles are so few that they can be considered isolated from 
each other. 

Hence in the first step of the ideal analysis one replaces a single colloidal 
particle with a gas of such particles, with density v. 

One then imagines that some external force F acts on the gas (also a fiction, 
in the ideal experiment) which only acts upon the colloidal particles (and 
not on the solvent fluid); it generates, in a stationary state, a gradient Fv 
of pressure, denoted 0,p. The pressure gradient also equals kgT0,v by the 
Raoult—van t’Hoff law (or more precisely by the assumption that the osmotic 
pressure law holds for macroscopic particles). The pressure gradient which 
exactly balances the force action is 0:p = Fv = kgT0,v by the Raoult-van 
t Hoff law. 

Supposing that the solvent obeys the Navier-Stokes equations one can 
then compute, via Stokes’ law, the particle velocity in terms of the viscosity 
(strictly speaking here it is necessary that particles be in fact macroscopic) 
so that one can compute the flux generated by the pressure gradient: 


vF = kpT 
6mnR  6mnR 


D = vu — 


av . (8.1.2) 


Finally the assumption that the individual particles undergo a diffusive 
motion implies that the flux has to be proportional to the density gradient 
giving: 


= -D ôðpv (8.1.3) 


where the proportionality constant D is the diffusion coefficient. 

Equating the two expressions for the flux of particles all auxiliary quanti- 
ties, used to mount the ideal experiment, have disappeared and one infers 
that assuming kinetic theory then a macroscopic particle (even just one) in 
a fluid and in thermal equilibrium (i.e. in a stationary state) must have a 
diffusive motion with a diffusion constant related to the viscosity by 


kpT 
= Gan’ (8.1.4) 
which is called the Æinstein-Smoluchowski relation: that one should at- 
tribute entirely to Einstein, see the following 88.2. 

The quantity D is also directly related to the average value (over many 
trajectories) (r(t)?) of the squared displacement r(t)? of the colloidal par- 
ticle in a time interval t; we shall see that (r(t)?) = 6Dt. Since the value 
of (r(t)”) is directly measurable in a microscope, this is a first theoretical 
relation that can be checked experimentally. 

Conceiving of macroscopic particles as behaving like microscopic molecules 
(and generating an osmotic pressure obeying Raoult—van t’Hoff law, much as 
true chemical solutions do) is an important idea that was, in itself, a novelty 
brought by Einstein’s work (heralded by Cantoni’s experiments, [Ca67]). It 
allowed everybody who had not yet accepted the atomic hypothesis to see 
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that thermodynamics laws must have a statistical nature. This in fact hap- 
pened at least after Perrin showed, experimentally, that Brownian motion 
theory was correct, [Pe70]. 

In fact, since it seems possible to build semi-permeable walls for macro- 
scopic objects, it becomes possible to perform thermodynamic cycles at con- 
stant temperature which, by using osmotic pressure, convert heat into work: 
the walls create an entity similar to Maxwell’s daemon. One can think of a 
cylinder filled with liquid and divided in two by a semi-permeable movable 
wall; on the left side there is also a colloidal solution seeing the impermeable 
side of the wall, while the right side contains no colloid. The wall can then 
be pushed to the right by using the osmotic pressure to extract work from 
the process; at the end of the run the wall is taken out (performing a work 
as small as we wish, in principle, because we only need to displace the wall 
horizontally) and reinserted back at the original position but with the two 
faces inverted. 

Then comes a long waiting time while the observer does nothing but witness 
the colloidal particles randomly hit the wall on the permeable side and get 
caught in the left part of the cylinder, again. After the last colloidal particle 
crosses the wall the initial conditions are restored and Carnot’s principle has 
been violated. 

The infinitely sharp eye of Maxwell’s daemon can thus be replaced by 
our microscope, as Poincaré stressed (having in mind a somewhat different 
apparatus based on the same ideas). Perrin highlighted this same aspect of 
the Brownian motion phenomenon, and he also noted that a machine like the 
above would have required unimaginably long times to extract appreciable 
amounts of energy, see 851 of [Pe70]. 

It is important to keep in mind that here we are somewhat stretching 
the validity of thermodynamic laws: the above machines are very idealized 
objects, like the daemon. They cannot be realized in any practical way: 
one can arrange them to perform one cycle, perhaps; (and even that will 
take forever if we want to get an appreciable amount of energy, see §51 of 
[Pe70]), but what one needs to violate the second law is the possibility of 
performing as many energy producing cycles as required (taking heat out of 
a single reservoir). Otherwise their existence “only” proves that the second 
law has only a statistical validity, a fact that had been well established since 
the work of Boltzmann. 

In fact an accurate analysis of the actual possibility of building walls semi- 
permeable to colloids and of exhibiting macroscopic violations of the second 
principle runs into grave difficulties: it is not possible to realize a perpetual 
motion of the second kind by using the properties of Brownian motion. It is 
in fact possible to obtain a single violation of Carnot’s law (or a few of them), 
of the type described by Perrin, but as time elapses and the machine is left 
running, isolated and subject to physical laws with no daemon or other ideal 
extraterrestrial being intervening (or performing work unaccounted for), 
the violations (i.e. the energy produced per cycle) vanish because the cycle 
will be necessarily performed as many times in one direction (apparently 
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violating Carnot’s principle and producing work) as in the opposite direction 

(using it). 

This is explained in an analysis of Feynman, see [Fe63], vol. 1, §46, where 
the semi-permeable wall is replaced by a wheel with an anchor mechanism, 
a “ratchet and a pawl’, allowing it to rotate only in one direction under the 
impulses communicated by the colloidal particle collisions with the valves 
of a second wheel rigidly bound to the same axis. Feynman’s analysis is 
really beautiful, and remarkable as an example of how one can still say 
something nonboring about perpetual motion. It also brings important 
insights into the related so-called “reversibility paradox” (that microscopic 
dynamics generates an irreversible macroscopic world). 

Diffusive motion produces a displacement r(t) over a time t whose squared 
average is (r?)(t) = 6Dt, because the probability f(x, t)d°a for finding a 
particle, initially located at the origin, in the little cube dx around x is the 
solution of the diffusion equation 0, f (x,t) = DAf(x,t), i.e. 


2 
e = /4Dt 


Gabbe (8.1.5) 


f(z t) = 


(the equation that Einstein derives by imitating Boltzmann’s method to ob- 
tain the Boltzmann’s equation also finding, at the same time, a microscopic 
expression for the diffusion coefficient D). The squared average value of the 
displacement is then simply: 


r(e’) = [ere t)d’x = 6Dt (8.1.6) 


We see that, although each collision produces a very small velocity varia- 
tion, immediately followed by variations of similar size and of either sign, 
nevertheless the particle undergoes a motion that over a long time (com- 
pared to the frequency of the collisions) leads to a change of each coordinate 
of the order of V2Dt (or V6Dt if one looks at the three-dimensional vari- 
ation) which not only is nonvanishing, but can also be considerably large 
and observable. 

As an application Einstein deduced (1906) the value of Boltzmann’s con- 
stant kg, hence of Avogadro’s number N4, from the measured diffusion of 
sugar suspended in water, finding N4 = 4.0 x 10%: the error being mainly 
due to a computational mistake. On the basis of accurate experiments, 
by using the theory of Einstein, Perrin and collaborators obtained a value 
essentially equal to the recently accepted value of N4, see 877 of [Pe70]. 

Brownian motion theory was derived by Einstein without him being really 
familiar with the details of the experiments that had been performed for 
about 80 years. He proceeded deductively, relying on ideal experiments, 
starting from the remark that particles, even if of macroscopic size, had to 
obey the laws of statisical mechanics. In particular they had to show energy 
equipartition and their osmotic pressure had to obey the perfect gas law 
(Raoult—van t’Hoff’s law). 
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His theory leads manifestly to motions that, if observed over time scales 
long compared to to (cf. lines following (8.1.1))! must be motions for which 
the velocity would depend on the time interval over which it is measured and 
it would diverge in the limit t — 0 or, better, it would become extremely 
large and fluctuating as t approaches the time scale tọ beyond which the 
theory becomes inapplicable. 

It provided, therefore, an example of an actual physical realization of cer- 
tain objects that had until then been just mathematical curiosities, like 
continuous but nondifferentiable curves, discovered in the ’800s by mathe- 
maticians in their quest for a rigorous formulation of calculus; Perrin himself 
stressed this point very appropriately, see §68 of [Pe70]. 

The assumption that fluid resistance to macroscopic particle motion follows 
Stoke’s law is by no means essential but it is a characteristic aspect distin- 
guishing Einstein’s theory from Smoluchowski’s, as we shall see below. In 
fact if the assumption was changed to v = CrF with Cpr suitably depending 
on R, then (8.1.4) would be replaced by D = kTCR. If, for instance, the 
particle was suspended in a rarefied gas, rather than in an incompressible 
liquid, then Cr would be different. 

More precisely if the colloidal particle proceeds with velocity v in a gas 
with density p, then the number of gas particles colliding with an average 
velocity —Um is tR?(v + Um)p/2, while tR?(v — Um)p/2 is the number of 
particles colliding with an average velocity +v,,. The former undergo a 
momentum variation, per unit time, 2m{v,, +v) and the latter 2m{v,, — v). 
Hence, instead of Stokes’ law, the force of the fluid on the particle is 


ST R?(u + Um)? — (Um — v)?)2m = d” R?ummpv (8.1.7) 
with c = 47. 
In the above calculation we supposed that half of the particles had velocity 
equal the absolute velocity average and half an opposite velocity; further- 
more the particle has been treated as a disk perpendicular to the direction 
of motion. A more correct treatment should assume a Maxwellian veloc- 
ity distribution and a spherical shape for the particle. The evaluation of 
the corrections is without special difficulties if one assumes that the gas 
is sufficiently rarefied so that one can neglect the recollision phenomena 
(i.e. repeated collisions between the particle and the same gas molecule, 
and it leads to a final result identical to (8.1.7) but with a different factor 
replacing 47. One would eventually find, following the argument leading to 
(8.1.4), 

kpT | kpT Vkpl 
OO CR?2mimp  cR?pV2mk8T  cR?pV2m 


and the constant c is in fact 24/7. For obvious reasons the regime in which 
the expression (8.1.7) for the friction holds is called Doppler’s regime and 


D (8.1.8) 


1 ie. long compared to 1 usec as is necessarily the case because of our human size 
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it is relevant for rarefied gases, while the Stokes regime, in which (8.1.1) is 
acceptable, pertains to friction in liquids. 


88.2. Smoluchowski’s Theory 


Smoluchowski’s theory, shortly following that of Einstein, shines light on 
Einstein’s hypotheses. The latter lead to (8.1.5) at the price of tacit assump- 
tions similar to the molecular chaos hypothesis familiar from Boltzmann’s 
attempts at deriving Boltzmann’s equation: in fact according to (8.1.5) 
colloidal particles show a diffusive motion, with mean square displacement 
proportional to time t. 

Smoluchowski, to disprove Nageli’s argument, considers a concrete micro- 
scopic model for the collisions: a particle of mass M is subject to a large 
number of collisions with the molecules, of mass m, of the fluid (~ 101° sec”! 
in many cases). If v, is the particle velocity after k collisions and if the k-th 
collision is with a molecule with velocity v before colliding, one infers from 
the elastic collision laws that: 


Uhr eut (RI) if |v] > ul, M >m (8.2.1) 


where R is a random rotation (depending on the impact parameter, also 
random). The above equation only deals with a single collision between two 
particles and it does not take into account that the heavy particle moves in a 
gas of light particles with positive density: this causes a cumulative friction 
effect; hence when the velocity v, grows the Doppler friction in (8.1.7) will 
start to damp it by a force —c' R?umpmv,, hence with acceleration —Av, 
with À = c!R?vmpm/M. Thus the velocity variation should rather be as 


Uppy © ge NT + TR -1)v (8.2.2) 


which includes, empirically, the damping effect. 

We can consider (R — 1)v as a random vector, at each collision, with zero 
average and square width (m/M)?(((R — 1)v)?) = 2(m/M)?v2,. Further- 
more collisions take place, in average, every time interval 7 such that 


m 
R’ pom = 1 Ar = — 8.2.3 
THR pu p > ATS ea, ( ) 


hence the space run in a time t during which n = t/r collisions take place is 
(if the initial positon and velocity of the particle are both 0, for simplicity) 


k 
> rY ew, (8.2.4) 


with w, independent random vectors with (w2) = 2v2, (m/M)?. Hence we 
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can immediately compute the average square dislacement as? 


r2 m 1 1 2% 1 kpT 
LA L 2 2 Scrim o a = gag 
p = walg) OAT rpm CRmpun rae 


where a = 2/(c'x) is a numerical constant of O(1) (see below) and c’ was 
introduced in (8.1.7); we see that we find again, essentially, Einstein’s for- 
mula in the case in which Stokes law is replaced by Doppler’s resistance 
of a rarefied gas, just as we would expect to find given the nature of the 
model (apart from the factor a). One shall realize, see §8.3, that Uhlenbeck— 
Ornstein’s theory is a macroscopic version of Smoluchowski’s theory. 

This led Smoluchowski to say that if, instead, the fluid is an incompress- 
ible liquid one has “simply” to replace, in the denominator of (8.2.5), the 
Doppler regime viscosity with that in the Stokes regime, with a somewhat 
audacious logical leap; he thus finds: 


Dsmol. = aDeinst. (8.2.6) 


Hence Smoluchowski’s theory is in a sense more ambitious than the Ein- 
steinian theory because it attempts at proving that the colloid motion is a 
diffusive one without neglecting completely the time correlations between 
consecutive collisions (which Einstein, as already mentioned, implicitly ne- 
glects). The model proposed is to think of the fluid as a rarefied gas which, 
therefore, does not obey the Stokes viscosity law. Strictly speaking Smolu- 
chowski’s model deals with a colloid realized in a rarefied gas, a situation not 
very relevant for the experiments at the time, because it is not applicable to 
a colloid realized in a fluid. Einstein’s method is more general and applies 
to both cases, although it does not really provide a microscopic justification 
of the diffusive nature of the motions. 

Conceptually Smoluchowski could not possibly obtain Einstein’s formula 
because he was not able to produce a reasonable microscopic model of a 
fluid in the Stokes regime (which even today does not have a satisfactory 
theory). His method in fact is not very “objective” even in the rarefied gas 
case since it leads to a result for D affected by an error of a factor a with 
respect to Einstein’s. 

This factor can be attributed to the roughness of the approximations, 
mainly to the not very transparent distinction between velocity and average 
velocity in the course of the derivation of (8.2.5), which does not allow us to 
compute an unambiguously correct value for a. Nevertheless Smoluchowski, 
without the support of the macroscopic viewpoint on which Einstein was 
basing his theory, is forced to take seriously the factor a that he finds and 
to transfer it (with the logical jump noted above) to an incorrect result in 
the case of a liquid motion. 
3kpT 


2 We use that 52 = Ava. The factor a changes if one makes a less rough theory of the 


Doppler friction taking into account that there are differences between various quantities 
identified in the discussion, like 4/ ((Av,)?) and (|Av,|) (which, by the Maxwellian 


distribution of the velocities, modifies c’ hence a). 
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As in the case of the factor c in (8.1.8) a more precise theory of the col- 
lisions between molecules and colloid is possible, in which one replaces the 
average values of the velocity with fluctuating values (distributed according 
to the appropriate Maxwellian): in this way the value a = 1 arises. Had 
Smoluchowski proceeded in this manner, although finding the correct result 
in Doppler regime, he would have still needed a logical jump to treat the 
case of a colloid in a liquid solution. 

It thus appears that Smoluchowski’s theory was not comparable with the 
experimental data available at the time; this was so for intrinsic reasons 
and it, perhaps, explains why he did not publish his results before Ein- 
stein’s papers (results that, as he says, he had obtained years earlier). It is 
not impossible that by reading Einstein’s memoir he could make the above 
analyzed logical jump which was necessary in order to make a comparison 
of the theory with the experiments (which occupies few lines in his long 
memoir). 

Later Smoluchowski abandoned the factor a and he adopted “Einstein’s’ 
value” (a = 1). 

It remains true, however, that Smoluchowski’s work is a milestone in kinetic 
theory and his was among the first of a series of attempts aimed at obtaining 
equations for macroscopic continua. The continua are regarded as describ- 
ing microscopic motions observed over time scales (and space scales) very 
large compared to microscopic times (and distances), so that the number of 
microscopic events involved in an observation made on a macroscopic scale 
was so large that it could be treated by using probability theory techniques 
(or equivalent methods). 

The use of probability theory is the innovative feature of such theories: 
already Lagrange, in his theory of the vibrating string, imagined the string 
as composed of many small coupled oscillators: but his theory was entirely 
“deterministic”, so much as to appear artificial. 

In 1900, six years before Smoluchowski’s work, Bachelier published the 
above mentioned research, [Ba00], with the rather unappealing title of 
Théorie de la spéculation which, as is maintained by some historians, would 
have been left unappreciated because it was superseded or shadowed by 
Einstein’s 1905 paper. Bachelier’s work, it is claimed, did in fact present 
the first theory of Brownian motion. 

It is in fact only a posteriori possible to see a connection between the 
theory of fluctuations of erratic (“?”) stock market indicators and Brownian 
motion; nevertheless Bachelier’s memoir can perhaps be considered to be 
the first paper in which dissipative macroscopic equations are rigorously 
derived from underlying microscopic models. 

In his work Brownian motion is not mentioned and his model for the evo- 
lution of list prices is that of a random increase or decrease by an amount 
Az in a time At with equal probability. The novelty with respect to the 
classical error analysis is that one considers the limit in which Ax and At 
tend to 0 while studying the list price variations at various different times 
t under the assumption that they are given by partial sums of the price 
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variations. In classical error theory one studies only the total error, i.e. the 
sum of all the variations, so that the summation index does not have the 
interpretation of time value, but is an index enumerating the various error 
values. 

One deduces that the probability distribution of the list price values at 
time t satisfies a diffusion equation; furthermore the probability distribu- 
tion of successive increments of value is a product of independent Gaussian 
distributions and one arrives at a kind of preliminary version of the stochas- 
tic process that was later studied by Wiener (essentially in [Ba00] “only” 
the analysis of the continuity of the sample paths as functions of time is 
missing). 

Thus we can consider Bachelier’s work as similar to Smoluchowski’s theory 
and, therefore, rather loosely related to Einstein’s theory and, furthermore, 
in his analysis no mention appears of physics and thermodynamics. But we 
have seen in the above discussion that it is precisely here that one of the 
main difficulties of Brownian motion lies. 

Smoluchovsky’s point of view is by no means superseded: in the last few 
decades it has been developed into very refined theories aiming at under- 
standing the more general problem of deriving macroscopic continua equa- 
tions from microscopic dynamics: see [Sp91], part II, for a perspective and 
technical details. 


88.3. The Uhlenbeck—Ornstein Theory 


As remarked by Einstein (as well as by Smoluchowski) Brownian motion 
theory held for experimental observations taking place at time intervals 
spaced by a quantity large compared to the time scale characteristic for the 
loss of the velocity acquired in a single collision, which is to = (6mnm~!R)~+. 

For shorter time intervals it still makes sense to define the velocity of the 
particles and motion cannot be described by the diffusive process charac- 
teristic of Brownian fluctuations proper. The trajectories appear, when 
observed over time scales larger than to, erratic and irregular so that if one 
tries to measure the velocity by dividing the space run by the correspond- 
ing time one finds a result depending on the time interval size and that 
becomes larger and larger the more the time interval is reduced. This is an 
immediate consequence of the fact that, on such time scales, the average of 
the absolute value of the displacement is proportional to vt, rather than 
to t. But this “divergence” of the velocity ceases as soon as one examines 
motion on time scales short compared to to 

One is then faced with the problem of developing a theory by describing 
motions in the “normal” phase at small time intervals, as well as in the 
Brownian phase, at larger time intervals. Langevin proposed a very simple 
mathematical model for the complete Brownian motion equations. 

He imagined that successive collisions with fluid molecules had an effect on 
the variations of each velocity component that could be described in terms 
of a random impulsive force F(t) and, hence, the equation of motion of a 
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coordinate of a colloidal particle would be: 


md = —dv + F(t) (8.3.1) 


where À is the friction coefficient for the colloid motion (i.e. 677R in the 
case of a fluid in a Stokes regime and cR?v,,mp in the case of a Doppler 
regime, see (8.1.7)). 

The Langevin equation, (8.3.1), can be discussed once a suitable random 
force law is assigned for F. The model proposed by Uhlenbeck and Ornstein 
for F(t) was that of a white noise, i.e. that F was such that: 


(1) no correlation existed between the values of F(t) at different time 
instants, 


(2) the distribution of an n-tuple F(t1), F(t2),...,F (tn) of values of the 
force, observed at n arbitrary instants tı < tg < ... < tn, was a Gaussian 
distribution, and 


(3) the average value of F(t) vanished identically as a function of t. 


This leads to the notion of a (centered) Gaussian stochastic process and 
to the more general notion of stochastic process and, also, it leads to the 
possibility of regarding Brownian motion as an “exactly soluble” stochastic 
process. 

Consider a stochastic process, i.e. a probability distribution, on a space of 
events that can be represented as functions of one (or more) zero average 
variables t — F(t), see §5.7, footnote 9. It is characterized by giving the 
probability of observing an n-tuple F(t1), F(t2),..., (tn) of force values 
when measuring the F(t) at n instants tı < t2 < ... < tn as a Gaussian 
distribution on the force values. 

It can be shown that such a process (i.e. the probability distribution of 
the functions t > F(t)) is uniquely determined by the two-point correlation 
function, also called the covariance, or propagator. This function is defined 
as the average value of the product of the function values at two arbitrary 
instants t1, ta: 

C(ti, t2) = (F(t) F(t2)) (8.3.2) 


and this means that the Gaussian distribution of the probability of an ar- 
bitrary n—tuple of values of F at n distinct time instants can be simply 
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expressed in terms of the covariance C (it is, in fact, simply expressible in 
terms of the inverse matrix of the matrix C(t;,t;), i,j =1,2,...n).? 

In this way the ” white noise” is defined as the Gaussian process with co- 
variance 


C(t, t) = F Slt — t’) (8.3.3) 


where f? is a constant and 6 is Dirac’s delta function. 

We shall write f? as 2\kgT (thus defining T) and it will appear that if T is 
identified with the absolute temperature of the solvent then the Uhlenbeck- 
Ornstein theory and the Einstein theory will agree where they should, see 
below. 

For C given by (8.3.3) an explicit solution of (8.3.1) is possible and it 
follows, as shown by Uhlenbeck and Ornstein, [UO30], that each velocity and 
position component which is generated from initial data so, vo (respectively 
for the position and the velocity) is a Gaussian process with nonzero average. 
If 8 = \/m = to‘ (see the lines following (8.1.1)), their average, at time t, 
is given by 

z(t) = so + rae —et) Gt) = ve h! (8.3.4) 


which follow simply by averaging (8.3.1) over the distribution of F (so that 
the term with F disappears because (F) = 0 by assumption), and then inte- 
grating the resulting equations for the average velocity; and the probability 
distribution of a velocity component v at time t is the Gaussian: 


m 1/2 m_(v— 74)? 
G(v,t) = (+S) e — — — © — À. 8.3.5 
CD = l akt e 7 PL RT = oe (63) 
3 Starting from a Gaussian probability distribution over n variables x1, ..., £n of the form 
ard skeet: 
m(dz) = conste * Dey dis J [e (+) 
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where M is a positive definite symmetric matrix, then: 


frw zizi = (M71) j (**) 


defining the covariance of the Gaussian process m and identifying it with the inverse of 
the matrix of the quadratic form defining m. Also, for all vectors y = (4@1,...,4n), we 


have: 
ins a SAS. Tin 


Viceversa ** plus Gaussianity of m implies the other two and also (xxx) implies the other 
two (and gaussianity of 7). Finally one should remark that any random variable which 
is a linear combination of the Gaussian variables x; has a probability distribution that 
is Gaussian. One now replaces sums with integrals and matrices with operators and 
one obtains the corresponding notions and relations for a Gaussian stochastic process 
indexed by a continuum time label. 


8.3.6 


8.3.7 


8.3.8 


8.3.9 


VIII. Brownian Motion 255 


This is computed also from (8.3.1) by writing it as 


t 
d 
v(t) =e Ftug + | Te Fr) (8.3.6) 
0 m 


so that squaring both sides and using (8.3.2) one gets 


(w(t) — e7%vo)’) = 


n f2 2 (1 — e—28t (8.3.7) 
obl- Br) gan = EUe 
- dt dt aaa d(t —t’) 233 


hence (8.3.5) follows because the distribution of v(t) must be Gaussian since 
(8.3.6) shows that v(t) is a linear combination of Gaussian variables, see 
footnote 3. 

By integrating (8.3.6) once more one obtains in the same way also the 
distribution of the position component s(t). It is a Gaussian with center at 
z(t) and quadratic dispersion 


kpT 2kpT 
o(t) = = (26t — 3 + 4e Pt — @ Ft) _ =D (8.3.8) 
m3? 7% mB 
mA 1 1/2 
H(s. t = ( —— —(s—3(t))?/20() 8.3.9 
=) < (8.3.9) 


The formulae just described reduce to the previous ones of Einstein’s theory 
in the limit t — oo, but they hold also if t < to = 4 and hence they solve 
the problem of the colloidal particle motions over time scales of the order 
of to or less. 

The relation nt = D (i.e. kgT~! = D) connecting viscosity (or “dissi- 
pation” ) and microscopic force fluctuations due to collisions with the solvent 
was the first example of a series of similar relations called “fluctuation- 
dissipation theorems” . 

Uhlenbeck and Ornstein also computed the “joint” probability distribu- 
tions of the values v(t), s(t1),...,v(tn), S(tn) for arbitrary t1,...tn, hence 
the resulting Gaussian process (i.e. the probability AA of the 
two component functions t > (v(t), s(t))) is therefore called a Ornstein- 
Uhlenbeck process. 

A modern discussion of the theory can be found in [Sp91], part I, Chap.8: 
here the motion of a Brownian particle is discussed by treating it is a 
“tracer” revealing the underlying microscopic motions. This is perhaps the 
main role of Brownian motion in macroscopic physics. 


88.4. Wiener’s Theory. 


From a mathematical viewpoint one can consider an idealized random mo- 
tion with the property that the position r at a time Ẹ+ t relative to that at 
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time Ẹ has, even for an extremely small time t, the probability given by 


2 
e /4Dt 


PL’) = TRDA 


(8.4.1) 
i.e. given, even for small time, by the asymptotic distribution (as t — oo or 
t > to) of the Brownian motion. 

Clearly the proportionality of r? to t (rather than to t?) for small t means 
that one shall find a motion with the remarkable property of not having a 
well-defined velocity at any time just as the Brownian motion observed over 
time scales longer than the previously introduced to. 

The very possibility of rigorously defining such an object is the remarkable 
contribution of Wiener (1923), who showed that the Gaussian process with 
transition probability (8.4.1) (already introduced by Bachelier in the above 
quoted article, [Ba00]) is well defined from a mathematical viewpoint and 
that with probability 1 the paths described by the particles are continuous, 
and in fact Holder continuous with exponent a (with any a < 1/2, see 
(8.4.7) below) (Wiener theorem, [Ne67],[IM65]). 

The Gaussian process describing the probability of trajectories t — r(t) in 
which the increments of r are distributed independently and with Gaussian 
distribution (8.4.1) is, in probability theory, called a Wiener process or, 
simply, a (mathematical) Brownian motion. From the Physics viewpoint 
it corresponds to the description of the asymptotic behavior of a colloidal 
particle in a fluid, for times t large compared to the characteristic relax- 
ation time to (while for generic times, including the short times, it is rather 
described by the Ornstein—Uhlenbeck process). 

More technically we can translate into a rather simple formula the state- 
ment that the increments of one (of the three) coordinate w(t) — w(t’) of 
a Brownian motion are indpendent and distributed according to (8.4.1). 
This means that if 0 < ti < t2 < ... < tn then the probability p that 
w(t1) € dz1,w(t2) € dr2,w(tn) € d£n is given by: 


Ww dx: _1G@;-2-1)? 
p= |[ ee PET, a HHO. (642) 
j=l 4/ An D(t; m tj—1) 

Although Wiener’s process is, as we have seen, a “mathematical abstrac- 
tion” it has, nevertheless, great theoretical interest and it appears in the 
most diverse fields of Physics and Mathematics. 

Its first application was to provide several quadrature formulae that express 
solution of various partial differential equations in an “explicit form”, as 
integrals over families of curves randomly distributed with a Wiener process 
law, [Ne67]. 

Obviously the calculation of such integrals is, usually, not simpler than the 
solution of the same equations with more traditional methods. Neverthe- 
less the explicit nature of the formulae provides an intuitive representation 
of the solutions of certain partial differential equations and often leads to 
surprisingly simple and strong a priori estimates of their properties. 
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A classical example is the theory of the heat equation: 


Ou = D Au, ule=o = Uo(x) (8.4.3) 


whose solution can be written as 
uzt) = f dy | PC) quo (8.4.4) 


where the integral is extended over all continuous curves T — w(T) that 
at T = 0 start from the position y arriving at time t at x. The integration 

(“sum”) over the paths is performed by using the distribution of the Wiener 
process, subject to the condition of reaching x at time t. 

Equation (8.4.3) has the following interpretation: heat undergoes a Brow- 
nian motion, i.e. it is transferred from point to point following a random 
motion with distribution given by Wiener process. Therefore the amount of 
heat u(x,t) which at time t is in x can be obtained by imagining that the 
amount of heat initially in a generic point y is equitably distributed among 
all trajectories of the Wiener process that leave y, so that the amount of 
heat that one finds in x at time t (i.e. u(x, t)) is the sum over all Brow- 
nian paths that arrive at x each carrying an amount of heat proportional 
to the amount uo(y)dy initially around the point y where they originated; 
and the proportionality factor is precisely equal to the fraction of Brownian 
trajectories that start in the volume element dy and arrive at x in the time 
interval t. 7 

Formula (8.4.4) is the simplest instance of a class of formulae that solve 
partial differential equations; a further classical example is provided by the 
equation 

Ou = D Au + V(x)u, ultso = Uo(x) (8.4.5) 


which can be explicitly solved by the quadrature 


os / dy / PL (due Jo YEA uly) (8.4.6) 


which is called the Feynman—Kac quadrature formula. 

As an example of a simple application of (8.4.6) one can derive a comparison 
theorem for solutions of the equation Œu = DAu + V;(x)u, u(0) = uo > 0 
where V;(x), j = 1,2, are two functions (not necessarily positive) such that 
Və(x) < Vi(x). Then (8.4.6) immediately implies that wo(z,t) > ur(z,t) 
for all z and t > 0, a property that is not so easy to prove otherwise. 

Equation (8.4.6) admits various extensions, relevant both in mathematics 
and in physics in very diverse fields ranging from probability theory or par- 
tial differential equations to statisical mechanics and relativistic Quantum 
field theory and even to the foundations of quantum mechanics (see the 
analysis of hidden variables in Nelson’s or in BOhm’s quantum mechanics 
formulations, [Ne67], [BH93]). One can say that in these applications the 
formulae of explicit solution really play a role similar to that played by the 
classical quadrature formulae in classical mechanics. 
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Wiener’s process had, and still has, a particular importance in probability 
theory, where it introduced a wealth of ideas and problems (and provided 
solutions to several preexisting problems). We quote the following as exam- 
ples among the mathematical properties of the process. 


(1) P. Levy’s regularity law. This gives the behavior of the variation w(t) — 
w(t’) of a position component w(t) of a Wiener process trajectory t > w(t) 
observed at two nearby instants t,t’ within a prefixed time interval (0, t]. 
It has already been said that Wiener proves that the trajectory is Holder 
continuous with an exponent a that can be fixed arbitrarily provided it is 

1 


smaller than 5. This means that, for arbitrarily fixed a < 1/2, we shall 


have with probability 1: 


im ———— = 0 8.4.7 
tt—0 |t- tje ( ) 
if0 < t,t’ < t. The arbitrariness of a < 1/2 makes it interesting to ask 
what is the “optimal” value for a, if any. Levy’s law says that the is no 
optimal a, but at the same time it provides an answer to what is the actual 
regularity of a trajectory w because it states 


t) — w(t! 
es oo =1 (8.4.8) 
wo (4D|t — t'| log 1 )1/2 
O<t,t/<t It CA 


with probability 1, [IM65]. 


(2) But the Levy’s regularity law does not provide us with informations 
about the properties of the trajectory in the vicinity of a given instant: in 
fact (8.4.8) only gives the worst behavior, i.e. it only measures the maximal 
lack of regularity within a specified time interval [0, t]. If we concentrate 
on a given instant ¢ then, in general, the trajectory will not be as irregular. 
This is in fact the content of the iterated logarithm law of Kintchin: the law 
gives the regularity property of a trajectory at a prefixed instant t. Fixing 
t = 0, and supposing w(0) = 0, the law is 


lim sup a =1 (8.4.9) 
t—=0 (4D¢t log (log 4)) 

with probability 1. Equation (8.4.9) is not incompatible with (8.4.8). In fact 

it only says that the worst possible behavior described by (8.4.8) is in fact 

not true with probability 1 at a prefixed instant i.e. it happens certainly, by 

the previous law, but certainly as well it does not happen at the time t at 

which one has decided to look at the motion! [IM65]. 


(3) The above two laws deal with the behavior of the trajectories at finite 
times; one can ask what is the long-time behavior of a sampled trajectory. 
The Einstein and Smoluchowski theories foresee that the motion goes away 
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from the vicinity of the origin (i.e. of the starting point) by a distance that 
grows proportionally to vt. 

An analysis of these theories indicates that by average one means average 
over a statistical ensemble. If at time t one measures the square of a co- 
ordinate of the particle which at time 0 was at the origin and if one does 
this for many Brownian particles (i.e. if one repeats the measurement many 
times) one finds 2Dt, in the average. 

However this does not mean that if one fixes attention on a single motion 
and one observes it as t grows, every coordinate w(t) squared grows at 
most as 2Dt in the sense that the maximum limit as t — © of w(t)?/t is 
2D. In fact such growth is really given by the global iterated logarithm law 
(Kintchin): 


t 
ted "1% = (8.4.10) 
t—+00 (4Dt log(log t)) 


with probability 1, [IM65]. 

Although (8.4.10) deals with Wiener process properties it expresses a prop- 
erty of relevance for experiments on Brownian motion: unlike the two previ- 
ous laws, which look at properties characteristic of the Wiener process and 
not of real Brownian motions (which are rather described by the Ornstein— 
Uhlenbeck process), this is a property that refers to the large time behavior 
(which is the same for the Wiener process and for the Ornstein-Uhlenbeck 
process). However it is very difficult to perform experiments so accurate 
as to reveal a correction to the displacement which is proportional to the 
square root of an iterated logarithm. 


(4) Equation (8.4.10) does not invalidate the measurability of D based on 
the observation of a single trajectory. Such measurements are performed by 
following the displacement w(t) of a coordinate as t varies between 0 and t. 
One then sets 


X) =t! | noa (8.4.11) 


and a fit is attempted by comparing the data X(t) with the function 2Dt. 
The procedure is correct, at least asymptotically as t — oo, because one 
shows that: (i) 

X(t 

—=1 8.4.12 

to 2Dt on 

with probability 1. This is the ergodic theorem for the Wiener process. Like 
the comparison between laws (1), (2) above, law (3) tells us that our particle 
will be “far” ahead of where it should infinitely many times (i.e. a factor 
(log log t)? ahead, (8.4.10)) although in the average it will be at a distance 
proportional to t ((8.4.12)), [IM65]. 


(5) The trajectories run by the Wiener process are rather irregular, as the 
Levy and Kintchin laws quantitatively show. One can ask which is the 
fractal dimension of the set described by a Wiener process trajectory. If 


8.4.13 


8.4.14 
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the dimension of the space in which the motion take place is > 2 then the 
fractal dimension is 2, in the sense of Hausdorff. This essentially means that 
if one wants to cover the trajectory with spheres of radius 1/n one needs 
O(n?) spheres in the sense that given € > 0 then O(n?~*) are (eventually as 
n — co) not sufficient while O(n?t*) are sufficient (even if n — oo). Note 
that a smooth curve, with finite length, can be covered by just O(n) spheres 
of radius n°1. 

The property of showing dimension 2 can be expressed also in several other 
ways which are perhaps intuitively equivalent, at a superficial level of under- 
standing, but strictly speaking different, and the analysis of the alternative 
ways illustrates subtle aspects of the Wiener’s process trajectories. 

For instance if one considers two distinct points in R? and from each of 
them one starts a Wiener path, then one finds that the two paths will 
eventually “cross” (i.e. they reach the same point at some later time) with 
probability 1 if d = 2,3, as expected intuitively on the basis that dimension- 
ally they are “surfaces”. But they do not cross if d > 4 (with probability 1): 
if they were really 2-dimensional geometric objects we would expect that 
they would intersect not only for d = 2,3 but for d = 4 as well. This is 
Lawler’s theorem, [La85], see also [HS92]. 


(6) A further celebrated property of the Wiener process, due to Wiener 
himself, exhibits interesting connections with harmonic analysis and Fourier 
series theory. Consider a sequence go, gi,... of Gaussian independent and 
equidistributed random variables and suppose that the distribution of each 
of them is (27)~1/? exp —g?/2. Set 


t 2.1/2 sin kt 
w(t) = 7590 + (=) DD Se - (8.4.13) 
k>1 


Then the random function w(t), for 0 < t < z, has a probability distribution 
(induced by the one assumed for the coefficients gx) identical to that of a 
Wiener process sample path in dimension 1 (and D = À), [IM65]. This 
remarkable fact is easy to check; since the covariance of a Gaussian process 
determines the process, it suffices to check that if t > t’ then, see (8.4.2): 


1 122 _1 (ey)? 
(w(t)w(t’)) = (Gar [etre 2") xy dx dy (8.4.14) 
The left-hand side is immediately computable from (8.4.13) and from the 
assumed Gaussian distribution of the g;,’s and the r.h.s. is an elementary 
integral so that the identity is easily checked. 

To conclude one can say that the Wiener process is a mathematical ab- 
straction originated from the physical phenomenon of Brownian motion: it 
describes its “large time” behavior (the behavior for all times being caught, 
more appropriately, by the Ornstein-Uhlenbeck process). It is nevertheless 
a mathematical entity of great interest which finds applications in the most 
different (and unexpected) fields of mathematics and physics. Further read- 
ing on both the mathematical and physical aspects of Brownian motion can 
be found in [Sp91]. 
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689.1. Ergodic Hypothesis Revisited 


An informal overview of the basic ideas of Chap.IX is given in Appendix 
9A1 below, which is the text of a conference at the Séminaire de Philosophie 
et Mathématiques of Ecole Normale Superieure in Paris. 

Giving up a detailed description of microscopic motion led to a statisti- 
cal theory of macroscopic systems and to a deep understanding of their 
equilibrium properties which we have discussed in Chap.I-VII. 

It is clear today, as it was already to Boltzmann and many others, that 
some of the assumptions and guiding ideas used in building up the theory 
were not really necessary or, at least, could be greatly weakened or just 
avoided. 

A typical example is the ergodic hypothesis. Although we have analyzed 
it in some detail in Chap.I it is interesting to revisit it from a different 
perspective. The analysis will not only help clarify aspects of nonequilibrium 
statistical mechanics, but it will be important for its very foundations. 

We have seen the important role played by the heat theorem of Boltzmann, 
[Bo84]. We recall that one can define in terms of time averages of total 
or kinetic energy, of density, and of average momentum transfer to the 
container walls, quantities that one could call, respectively, specific internal 
energy u, temperature T, specific volume v, pressure p; and the heat theorem 
states that when two of them varied, say the specific energy and volume by 
du and dv, the relation 


du + p dv 


T = exact (9.1.1) 


holds. 

In the beginning, [Bo66], this was discussed in very special cases (like free 
gases), but about fifteen years later Helmholtz, influenced by the progress of 
Boltzmann on the proof of the heat theorem, wrote a series of four ponder- 
ous papers on a class of very special systems, which he called monocyclic, in 
which all motions were periodic and in a sense non-degenerate, and he noted 
that one could give appropriate names, familiar in macroscopic thermody- 
namics, to various mechanical averages and then check that they satisfied 
the relations that would be expected between the thermodynamic quantities 
with the same name. 

Helmholtz’ assumptions about monocyclicity are very strong and seem to 
be satisfied in no system other than in confined one-dimensional Hamilto- 
nian systems. Here are the details of Helmholtz’ reasoning (as reported by 
Boltzmann), in a simple example. 

Consider a one-dimensional system in a confining potential.’ There is only 
one motion per energy value (up to a shift of the initial datum along its 
trajectory) and all motions are periodic so that the system is monocyclic. 
We suppose that the potential y(x) depends on a parameter V. 


1 A potential y(x) such that |y’(x)| > 0 for |z| > 0, y’”(0) > 0 and v(x) 5% + 00. 
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Suppose that one identifies a state with a motion with given energy E and 
given V. Then, let 


U = total energy of the system = K +, 

T = time average of the kinetic energy K, 

V = the parameter on which 4 is supposed to depend, 
p = — time average of Ovy. 


A state is parameterized by U,V and if such parameters change by dU, dV 
respectively we define: 


dL = —pdV, dQ = dU —dL. (9.1.2) 
then: 


Theorem (Helmholtz): The differential dQ/T = (dU + pdV)/T is exact. 


Repeating, for convenience, the few lines of proof already discussed in Ap- 
pendix 1.A1 to Chap.I, this can be proved by directly exhibiting a function 
S whose differential is (dU + pdV)/T. In fact let ri (U,V) be the extremes 
of the oscillations of the motion with given U,V and define S as 


S(U,V) = 2log VU — y(x) dx (9.1.3) 


and $9 is the logarithm of the action since U — (x) is the kinetic energy 
K(a;U,V); so that 


J (dU — Ay p(a)dV) 4 


dS = 
dx 
[K 


(9.1.4) 


Noting that Wa = \/ =dt, we see that the time averages are obtained by 


integrating with respect to a and dividing by the integral of Tr Hence: 


dU + pdv 

T 1 
completing Helmholtz’ remark. For a more extended discussion of the the- 
orem see Appendix 1.A1 to Chap.I. 

Boltzmann saw that this was not a simple coincidence: his interesting (and 
healthy) view of the continuum which, probably, he never really considered 
more than a convenient artifact, useful for computing quantities describing 
a discrete world where sums and differences could be approximated by in- 
tegrals and derivatives, cf. §1.9 and [Bo74] p. 43, led him to think that in 
some sense monocyclicity was not a strong assumption. 

Motions tend to recur (and they do in systems with a discrete phase 
space) and in this light monocyclicity would simply mean that, waiting 


dS = (9.1.5) 


9.1.6 


9.1.7 


IX. Coarse Graining and Nonequilibrium 265 


long enough, the system would come back to its initial state. Thus its mo- 
tion would be monocyclic and one could try to apply Helmholtz’ ideas (in 
turn based on his own previous work) and perhaps deduce the heat theorem 
in great generality. The nondegeneracy of monocyclic systems becomes the 
condition that for each energy there is just one cycle and the motion visits 
successively all (discrete) phase space points. 

Taking this viewpoint one had the possibility of checking that in all mechan- 
ical systems one could define quantities that one could name with “thermo- 
dynamic names” and which would satisfy properties coinciding with those 
that thermodynamics would predict for them, see Chap.I, II. 

He then considered the two-body problem, showing that the thermody- 
namic analogies of Helmholtz could be extended to systems which were 
degenerate, but still with all motions periodic. This led to somewhat ob- 
scure considerations that seemed to play an important role for him, given 
the importance he gave them. They certainly do not help in encouraging 
reading his work: the breakthrough paper of 1884, [Bo84], starts with asso- 
ciating quantities with a thermodynamic name to Saturn’s rings (regarded 
as rigid rotating rings!) and checking that they satisfy the right relations, 
like the second principle, see (9.1.1). 

In general one can call monocyclic a system with the property that there 
is a curve  — 2(¢), parameterized by its curvilinear abscissa £, varying in 
an interval 0 < £ < L(E), closed and such that x(£) covers all the positions 
compatible with the given energy E. 

Let x = x({) be the parametric equations so that energy conservation can 
be written, for some m > 0, 

1 


mË + p(x(£)) =E. (9.1.6) 


then if we suppose that the potential energy y depends on a parameter V 
and if T is the average kinetic energy, p = —(Ovy) then, for some S, 


_ dE +pdV 


ds T ; 


p=—(Ovy), T= (K) (9.1.7) 
where (-) denotes the time average (see Appendix 1.A1, Chap.I). 

A typical case to which the above can be applied is the case in which the 
whole energy surface consists of just one periodic orbit, or when at least 
only the phase space points that are on such orbit are observable. Such 
systems provide, therefore, natural models of thermodynamic behavior. 

A chaotic system like a gas in a container of volume V, which will be re- 
garded as an important parameter on which the potential y (which includes 
interaction with the container walls) depends, will satisfy “for practical pur- 
poses” the above property, because (Feynman) “if we follow our solution 
i.e. motion] for a long enough time it tries everything that it can do, so to 
speak’ (see p. 46-55 in [Fe63], vol. I). Hence we see that we should be able 
to find a quantity p such that dE + pdV admits the average kinetic energy 
as an integrating factor. 
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On the other hand if we accept the viewpoint (ergodic hypothesis) that 
phase space is discrete and motion on the energy surface is a monocyclic 
permutation of its finitely many cells, the time averages can be computed by 
integrals with respect to the uniform distribution, that we shall call Liouville 
distribution,? see §1.7. 

Hence if u is the Liouville’s distribution on the surface of constant energy 
U, and T is the p-average kinetic energy then there should exist a function 
p such that T~! is the integrating factor of dU + pdV. 

Boltzmann shows that this is the case and, in fact, p is the u-average 
(—Ovy) and it is also the average momentum transfer to the walls per unit 
time and unit surface, i.e. it is the physical pressure, see Appendix 9.A3. 

This is not a proof that the equilibria are described by distributions y 
of the the microcanonical ensemble. However it shows that for most sys- 
tems, independently of the number of degrees of freedom, one can define a 
mechanical model of thermodynamics, i.e. one can define various averages 
of mechanical quantities and name them with names of thermodynamical 
functions, and check that they satisfy the relations that would follow from 
classical thermodynamics, see Chap.LIl for more details. 


thermodynamic relations are, therefore, very general and simple conse- 
quences of the structure of the equations of motion. They hold for small 
and large systems, from one degree of freedom (the case of Helmoltz’ mono- 
cycles) to 1073 degrees and more (the case of a gas in a box). 


The above arguments, based on a discrete view of phase space, suggest how- 
ever that, in general, the thermodynamic relations hold in some approximate 
sense, as we have no idea of the precise nature of the discrete phase space. 
However, in some cases, they may hold exactly even for small systems, if 
suitably reformulated: for instance in the 1884 paper, [Bo84], Boltzmann 
shows that in the canonical ensemble the relation (9.1.1) (i.e. the second 
law) holds without corrections even if the system is small, as explained in 
Chap.Il. 

Thus the ergodic hypothesis does help in finding out why there are mechan- 
ical “models” of thermodynamics: they are ubiquitous, in small and large 
systems alike, but usually such relations are of interest in large systems and 
not really in small ones. 

A critical comment and a warning is important at this point: for large 
systems any theory claiming to rest on the ergodic hypothesis may seem 
bound to fail, see 81.7, because if it is true that a system is ergodic, it is 
also true that the time the system takes to go through one of its cycles is 
simply too long to be of any interest and relevance: this was pointed out 
very clearly by Boltzmann, [Bo96], and earlier by Thomson, [Th74]. 

The reason why we observe approach to equilibrium over time scales far 
shorter than the recurrence times is due to the property that the micro- 


2 Which is the only invariant distribution if one accepts the above discrete point of view, 
probably Boltzmann’s. 
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canonical ensemble is such that on most of phase space the actual values 
of the observables, whose averages yield the pressure and temperature and 
the few remaining other thermodynamic quantities, assume the same value, 
[La72], and p. 206 of [Bo74]. This implies that such values coincide with 
the average and therefore satisfy the heat theorem. 


The ergodic hypothesis loses its importance and fundamental nature and it 
appears simply as a tool used in understanding that some of the relations that 
we call “macroscopic laws” hold in the same form for all systems, whether 
small or large. 


89.2. Timed Observations and Discrete Time 


The question that we shall now investigate is whether there can be anything 
similar to the above done out of equilibrium but still in a stationary state: 
are there statistical properties that hold for small and large systems alike 
under the “only” assumption that the systems evolve in a very disordered, 
“chaotic”, way? If so such properties might have physical relevance for large 
systems: when the system is large they may become observable because 
they may become a property of most of the individual configurations of the 
system without need of time averaging, just as it happens in equilibrium. 

And they might be checked, and perhaps even be interesting, in small sys- 
tems which become therefore a natural testing ground, mainly because of 
the availability of fast computer experiments, just as it happens in equilib- 
rium with the ergodic hypothesis, which is usually tested in systems with 
very few degrees of freedom. 

The first step in the investigation is a dynamical hypothesis on the nature 
of the motions of complex systems (like a gas in a box). This hypothesis 
has developed quite slowly in the past epoch: it developed from the theory 
of complex motions in fluid mechanics and it was formulated by Ruelle 
in the early 1970s (1973) and written explicitly later, [Ru80], [Ru76]. It 
influenced research strongly, see for instance [ECM90],[ECM93]; and it led 
to some concrete results, after being reformulated and put in the context of 
nonequilibrium statistical mechanics, “much later” [GC95]. The hypothesis 
will be stated below and is called the chaotic hypothesis. 


To proceed to the formulation of the hypothesis we need to set up a conve- 
nient kinematic description of disordered motions, convenient for the study 
of chaotic evolutions: this is necessary because the usual kinematics is well 
suited for orderly motions but is insufficient for disordered ones. 

In Chap.I we have already hit the difficulty of a proper representation of 
the evolution of a system of N particles in a box V as a permutation of 
phase space cells. The difficulty came from the hyperbolic nature of the 
evolution that stretches some coordinates and contracts others. This forced 
us to use very small cells and very small time intervals as phase space and 
time units. 
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The division in cells of phase space is, in this way, extremely fine (although, 
as we discussed, there are conceptual limits to the precision and reliability 
that one can reach in this way), so fine that we can regard space as discrete. 

It is interesting to change viewpoint and, keeping the conception of space 
as a continuum, to try using a discrete representation of motion based on 
phase space cells that are not as small as possible (but that are nevertheless 
still compatible with the discrete picture that we follow after Chap.I.): we 
shall call such a description a “coarse grained’ one. 

In other words it may be more convenient to use larger phase space cells 
and to find a description of dynamics in terms of them, which explicitly 
and “exactly” still takes into account the (often) “hyperbolic” nature of the 
evolution, i.e. its (usual) high instability or its chaoticity. 

The use of the “coarse grained” cells brings to mind something “approx- 
imate” and not too well defined. Here we do not want to convey such 
intuition and the representation of motion, that we look for, will be in prin- 
ciple as exact as wished, and involve no approximation at all. The modern 
efforts to clarify the notion of coarse graining can be traced back to Krylov, 
[Kr79], whose work strongly influenced Sinai, [Si79], whose work in turn 
influenced Ruelle leading him, eventually, to his hypothesis. 

We consider a system evolving on a bounded surface X and with the evo- 
lution acting near a given point x by expanding some line elements and by 
contracting some others. The evolution will be described by a map S that 
can be thought of as being obtained from the time evolution flow x — S;x 
by monitoring it every time some special event happens (for instance a col- 
lision between some pair of particles). In this way the surface © consists of 
the collection of the special events that one monitors, which will be called 
timing events or monitored events. 

The geometrical meaning of the construction of S from the flow S+ is il- 
lustrated in Fig. 9.2.1 where a trajectory x — Sx in the phase space for 
the evolution in continuous time (“usual phase space”) is depicted together 
with the surface © consisting of the monitored events (which is the phase 
space where the motion is described by the map S: the map S acts, in the 
Fig. 9.2.1, on the monitored event € mapping it into SE and $7€). 
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x-trajectory 


Fig. 9.2.1 
SE S ig. 9 


SE 


Line elements on X are transformed by the time evolution S and some of 
them become longer and others shorter (or keep their length). Note that in 
order to measure the length of a line element we need a metric defined on X: 
we shall take the natural metric induced on the surface © by the metric in 
the space in which © lies (which is usually the Euclidean space of position 
and velocity vectors of the system particles). But of course the metric that 
we use is rather arbitrary and we have to be careful, it is best to try dealing 
only with notions that turn out to be metric independent. 

In the following we shall always use the above discrete time evolution, but 
all that we say can be quite easily translated in terms of properties of the 
continuous time flow S+.’ 


89.3. Chaotic Hypothesis. Anosov Systems 


Boltzmann’s equation and Brownian motion theory are examples of at- 
tempts at studying nonequilibrium problems. The first has the ambition of 
discussing the approach to equilibrium, while the second deals with motions 
that take place in equilibrium. 

In general we shall say that a system is in a nonequilibrium situation when- 
ever nonconservative external forces act on it and, usually, sustain macro- 
scopic motions. Such systems will evolve and reach in due course a station- 
ary state, which will not be one of the equilibrium states with which we are 
familiar from the preceding chapters. 

This is because time evolution will ultimately be described by differential 


3 One should not confuse this map with the map used in Chap.I for the one time step 
evolution. Two successive timing events still contain very many time steps in the sense 
of Chap.I: here we do not discretize time or space. Motion appears evolving discretely 
simply because we choose to observe it from time to time, when something that we 
consider interesting happens. 
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equations which will contain dissipative terms and consequently phase space 
volume will be contracting; hence probability distributions corresponding 
to the statistics of stationary states must be concentrated on sets with zero 
volume. 

Before attempting a general theory of the approach to an equilibrium state 
or, more generally, to a stationary state it seems reasonable to study the 
properties of the stationary states themselves, end products of the evolution 
under external driving forces (which may vanish, however, so that equilib- 
rium theory will still be a “special case” ). 

This will include investigating phenomena like Brownian motion which 
can be regarded as dynamical properties of equilibrium states, as well as 
genuinely nonequilibrium phenomena, like thermodynamical relations be- 
tween quantities that can be defined in systems in nonequilibrium stationary 
states. 

And one can imagine stretching the analysis to stationary states of the 
macroscopic equations that are supposed to be obeyed out of equilibrium, 
like the Euler or Navier-Stokes equations for systems macroscopically be- 
having as fluids. 

Approach to equilibrium or to a stationary state is likely to be a more 
difficult problem and it will be set aside in most of what follows. Hence we 
shall not deal with states of systems evolving in time: rather we refer to 
properties of states that are already in a stationary state under the influence 
of external nonconservative forces acting on them. For instance think of an 
electric circuit in which a current flows (stationarily) under the influence 
of an electromotive field, or of a metal bar with two different temperatures 
fixed at the extremes; and one can even think of a Navier-Stokes fluid in a 
(stationary) turbulent Couette flow or a more general flow. What follows 
applies also to such apparently different systems (and in fact the basic ideas 
were developed having precisely such systems in mind). 

The first two systems, regarded as microscopic systems (i.e. as mechanical 
systems of particles), do certainly have very chaotic microscopic motions 
even in the absence of external driving (while macroscopically they are in 
a stationary state and nothing happens, besides a continuous, sometimes 
desired, heat transfer from the system to the surroundings). The third 
system also behaves, as a macroscopic system, very chaotically at least 
when the Reynolds number is large. 

A basic problem is that the situation is quite different from that in which 
Boltzmann was when attempting a microscopic proof of the heat theorem: 
there is no established nonequilibrium thermodynamics to guide us. 

The great progress of the theory of stationary nonequilibrium that took 
place in the past century (the XX-th), at least that which was unanimously 
recognized as such, only concern properties of incipient nonequilibrium: 
i.e. transport properties at vanishing external fields (we think here of On- 
sager’s reciprocity and of its more quantitative form given by the Green- 
Kubo transport theory). So it is by no means clear that there is any general 
nonequilibrium thermodynamics. 
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Nevertheless in 1973 a first suggestion that a general theory might be 
possible for nonequilibrium systems in stationary and chaotic states was 
made by Ruelle, eventually written down only later in [Ru78c], [Ru80]. 
This suggestion was made with full understanding of its ambition: 


“If one is optimistic, one may hope that the asymptotic measures will play 
for dissipative systems the sort of role which the Gibbs ensembles played for 
statistical mechanics. Even if that is the case, the difficulties encountered 
in statistical mechanics in going from Gibbs ensembles to a theory of phase 
transitions may serve as a warning that we are, for dissipative systems, not 
yet close to a real theory of turbulence”, [Ru78c]. 


The proposal is very ambitious because it suggests a general and essen- 
tially unrestricted answer to what should be the ensemble that describes 
stationary states of a system, whether in equilibrium or not. In the recent 
formulations of Cohen and Gallavotti it reads: 


Chaotic hypothesis: for the purpose of studying macroscopic properties, the 
time evolution map S of a many-particle system can be regarded as a mixing 
Anosov map.4 


We defer discussing in detail the technical notion of “mixing Anosov map” 
to the coming sections and the hypothesis is written here only for con- 
creteness and later reference. For the moment it will suffice to say that 
mixing Anosov maps are the paradigm of chaotic motions: they are well 
understood dynamical systems which show chaotic behavior in the “purest” 
possible way. They play a role in nonlinear dynamics very similar to that 
played by harmonic oscillators in the theory of stable motions. 


Remark: If the evolution is very dissipative and motions tend to an attract- 
ing set smaller than the whole phase space the hypothesis may be interpreted 
as meaning that the attracting set can be regarded as a smooth surface and 
that the restriction of the evolution to it is a mixing Anosov map, see below 
(which is a very special case of a wider class of chaotic systems called Axiom 
A systems), [BGG97],[BG97]. However a less strict interpretation could be 
to say that the attractor is an “Axiom A attractor”, see below. 


The ergodic hypothesis led Boltzmann to the general theory of ensembles 
(as acknowledged by Gibbs, p. vi in [Gi81], whose work has been perhaps 
the main channel through which the allegedly obscure works of Boltzmann 
reached us): besides giving the second law, (9.1.1), it also prescribed the 
microcanonical ensemble for describing equilibrium statistics. 

The reasoning of Ruelle was that from the theory of simple chaotic systems 
one knew that such systems, just by the fact that they are chaotic, will reach 


4 This is a notion that in the original work [GC95] was called “transitive Anosov map”, 
however it turns out that the established nomenclature is different and here I try to 
adhere to it, as much as possible. 


9.3.1 
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a “unique” stationary state. Therefore simply assuming chaoticity would 
be tantamount to assuming that there is a uniquely defined ensemble which 
should be used to compute the statistical properties of a stationary system 
out of equilibrium. 

This argument is based on the idea that a chaotic system, even when it is 
not exactly a mixing Anosov map in the mathematical sense, does share its 
main qualitative features, [Ru78c], in a sense similar to the one in which in 
stability theory one infers properties of nonlinear oscillations from those of 
harmonic oscillators or of integrable systems. And a key property of mixing 
Anosov maps is that their motions show a unique statistics u. This means 
that there is a unique probability distribution u on phase space F such that 
for all (smooth) observables F(x): 


1 T-1 
sim 7 2 FS") = [Foun (9.3.1) 


apart from a set of zero volume of initial data x € F.5 

The distribution y is called the SRB distribution or the statistics of mo- 
tions: it was proven to exist by Sinai for Anosov systems and the result 
was extended to the much more general Axiom A attractors by Ruelle and 
Bowen, [BR75], [Ru76]. Natural distributions were, independently, discussed 
and shown to exist, [LY73], for other (related and simpler) dynamical sys- 
tems, although in an apparently less general context and with a less general 
vision of the matter, [Si70],[BR75]. 

Therefore one is, at least in a very theoretical way, in a position to inquire 
whether such a unique ensemble has universal properties valid for small and 
large systems alike (of course we cannot expect too many of them to hold, 
but even a single one would be interesting). 

In fact in equilibrium theory the only universal property is precisely the 
heat theorem, besides a few general (related) inequalities (e.g. positivity of 
the specific heat or of compressibility). The theorem leads, indirectly as we 
have seen, to the microcanonical ensemble and then, after one century of 
work, to a rather satisfactory theory of phenomena like phase transitions, 
phase coexistence and universality. 

From this point of view the criticized (and more and more often dismissed 
as, at best, unnecessary) ergodic hypothesis assumes a new status and 
emerges as greatly enhanced. Ruelle’s proposal seems to be its natural 
(and, perhaps, the unique) extension out of equilibrium. 

The proposal was formulated in the case of fluid mechanics: but it is so 
clearly more general that the reason why it was not explicitly referring 
to statistical systems is, probably, due to the fact that, as a principle, it 
required some “check” if formulated for statistical mechanics. As originally 


5 In general replacing S by S~! in (9.3.1) leads, when the new limit exists, to a different 
probability distribution on phase space, which we can call the statistics towards the past, 
while (9.3.1) defines the statistics towards the future. One cannot expect that, when 
they exist, the two statistics coincide. 
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stated, and without any further check, it would have been analogous to the 
ergodic hypothesis without the heat theorem (or other consequences drawn 
from the theory of statistical ensembles). 

Of course the chaotic hypothesis will suffer from the same objections that 
are continuously raised about the ergodic hypothesis: namely “there is the 
time scale problem”, see 81.7. 

To such objections the answer given by Boltzmann should apply unchanged: 
large systems have the extra property that the interesting observables take 
the same value in the whole (or virtually whole) phase space. Therefore 
their values satisfy any relation that is true no matter whether the system is 
large or small: such relations (whose very existence is, in fact, surprising) 
might even be of no interest whatsoever in small systems (like in the above- 
mentioned Boltzmann’s rigid Saturn ring, or in his other similar example of 
the Moon regarded as a rigid ring rotating about the Earth). 

Evidence for the nontrivial applicability of the hypothesis built up and 
it was repeatedly hinted at in various papers dealing with numerical ex- 
periments, mostly on very few particle systems (< 100 to give an indi- 
cation), [HHP87]. In attempting at understanding one such experiment, 
[ECM93], the above “formal” interpretation of Ruelle’s principle was for- 
mulated, [GC95], for statistical mechanics (as well as for fluid mechanics, 
replacing “many-particles system” with “turbulent fluid”). 

The hypothesis was made first in the context of reversible systems: they 
were in fact the subject of much of the experimental work that bloomed 
once the importance and relevance of reversibilty was strongly stressed, 
and supported by experiments, by Hoover and coworkers, for highlights 
see [HHP87],/EM90],[ECM90],[DPH96]. Note that saying that reversibility 
can be relevant to, and even facilitate, the analysis of dissipative motions 
is highly nontrivial and it required insight and intellectual courage to be 
introduced. 

The strict interpretation of the chaotic hypothesis given in the above 
remark rules out, when the attractive set is smaller than phase space, 
attractors® with a fractal closure (i.e. attracting sets which are not smooth 


6 Tt is important to distinguish between attracting set and attractor. The first is a closed 
set such that all points close enough to it evolve in time keeping a distance from it 
that tends to 0 as t — +00, and furthermore are “minimal” in the sense that they 
do not contain subsets with the same properties. Consider, to define an attractor, an 
attracting set which admits a statistics 4 given by (9.3.1) for all but a zero volume set 
of nearby points x. Any subset C of such an attracting set with u(C) = 1 is called an 
attractor. More generally we can imagine choosing initial data x near an attracting set 
with a probability distribution po that can even be concentrated on sets of zero volume, 
i.e. that is completely different from the volume measure. Supposing that all data x but 
a set of zero uo measure satisfy (9.3.1) (of course, in general, with a statistics u’ different 
from u and uo-dependent). The subsets of the attracting set that have probability 1 
with respect to the statistics generated by such initial data x, will be called “attractors 
for the data with distribution uo”, and they may have 0 measure with respect to the 
statistics u defined by (9.3.1). We see that in general the notion of attracting set is 
uniquely determined by the dynamics, while the notion of attractor depends also on 
which initial data we are willing to consider; and even once the class of initial data is 
chosen the notion of attractor is not uniquely defined as we can always take out of an 
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surfaces); to include them, one should replace, in the formulation of the 
chaotic hypothesis, the word “Anosov” into “Axiom A” (a similar but much 
weaker notion): but it seems appropriate to wait and see if there is real need 
of such an extension. It is certainly an essential extension for small systems, 
but it is not clear to me how relevant could fractality be when the system 
has 102 particles). Therefore we shall concentrate our attention mainly on 
Anosov systems. 

Finally I add one more comment on the words at the beginning of the 
chaotic hypothesis “for the purpose of ...”: an easy critique could be that 
this is “vague” since it is obvious that virtually none of the systems of 
interest in statistical mechanics (or fluid dynamics) are Anosov systems 
in the mathematical sense (they are often not smooth, or obviously not 
hyperbolic and often not even ergodic). 

Nevertheless I think that the hypothesis is well founded and it has an 
illustrious predecessor in the early ergodic hypothesis of Boltzmann: he was 
trying to prove the heat theorem; he needed the condition that the motions 
were periodic; he said that if they were not they could still be considered 
so for practical purposes “because a nonperiodic orbit can be regarded as 
periodic with infinite period”. 

Assuming that the motions were periodic (i.e. the systems were “mono- 
cyclic” ) led him to discover a hitherto unknown property of periodic motions 
(the heat theorem). Indeed things “went as if the motions were periodic”!, 
[Bo66], a first rough formulation of the ergodic hypothesis. We have dis- 
cussed at length the interpretation and the interest of the ergodic hypothesis 
in Chap.L III, and we have seen that it leads to important relations be- 
tween averages; one can think that “all systems are Anosov” in the same 
sense. 

Mathematically what is being said is that there might be general properties 
of Anosov systems that might have been missed in spite of the vast reasearch 
on the subject. 


89.4. Kinematics of Chaotic Motions. Anosov Systems 


To proceed we need a more precise formulation of the notion of Anosov 
systems and an analysis of the kinematics of their motions. We give here 
an informal definition and in the next section we give a detailed discussion 
of the kinematics of motions. 


Definition: A mixing Anosov system is, see p.55 of [AA68], a smooth map 
S (i.e. of class C®) of a smooth manifold M (“phase space”) and around 
every point x one can set up a local coordinate system with the following 
properties associated with it: 


(a) depends continuously on x and is covariant (i.e. it follows x in its 
evolution) and 


attractor one orbit and still have an attractor (unless the attractor consists of finitely 
many points). 
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(b) is hyperbolic, i.e. transversally to the phase space velocity of any chosen 
point x the motion of nearby points looks, when seen from the coordinate 
frame covariant with x, like a hyperbolic motion near a fixed point. This 
means that in a sphere of small radius 6 around x there will be a connected 
‘local stable coordinate surface”, the stable local manifold W5 through x, 
whose points have trajectories that get close to the trajectory of x at ex- 
ponential speed as the time tends to +o, and a “local unstable coordinate 
surface”, the local unstable manifold W$, whose points have trajectories 
that get close to the trajectory of x at exponential speed as the time tends 
to —oo. Furthermore the exponential speed of approach must admit a bound 
independent on x, i.e. it has to be “uniform”. 


(c) the global stable manifold and the global unstable manifold of every 
point, i.e. the sets Uos” wu and Ure o8” Wee are dense in M. This 
excludes the possibility that the phase space consists of disconnected invari- 
ant parts. It also excludes the case that it consists of n disconnected parts 
M,,...,Mn cyclically permuted by S so that S” is a mixing Anosov map as 
a map of M; into itself, clearly a case in which one is “improperly defining” 
phase space and evolution map (which should be rather defined as Mı, say, 
and S” respectively). 


If only (a),(b) hold the system is simply called an “Anosov system”. It is 
a theorem (Anosov) that the planes tangent to Wi’, WS are quite smoothly 
dependent on x; they are Hôlder continuous in x (in general not more, 
however, because in general they are not differentiable in x even if the map 
S is analytic), [AA68],[Ru89]. 

If the system is described in continuous time the direction parallel to the 
velocity has to be regarded as an “extra” neutral direction where, on the 
average, no expansion nor contraction occurs. However here we shall adhere 
to the discrete viewpoint based on timed observations, see 89.1. For a 
discussion of the continuous time point of view see [BR75], [Bo74], [Ge98]. 

The simple but surprising and deep properties of Anosov maps are by and 
large very well understood, [Ru89]. Unfortunately they are not as well 
known among physicists as they should be: many seem confused by the 
language in which the above concepts are usually presented; however it is a 
fact that such remarkable mathematical objects (i.e. Anosov systems) have 
been introduced by mathematicians, and physicists must, therefore, make 
an effort at understanding the new notion and its physical significance. 

In particular, as mentioned above, if a system is Anosov, for all observ- 
ables F (i.e. continuous functions on phase space) and for all initial data x, 
outside a set of zero volume, the time average of F exists and can be com- 
puted by a phase space integral with respect to a distribution u uniquely 
determined on phase space F as expressed by (9.3.1). 

Clearly the chaotic hypothesis solves in general (i.e. for systems that can 
be regarded as “chaotic” ) the problem of determining which is the ensemble 
to use to study the statistics of stationary systems in or out of equilibrium 


9.4.1 
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(it clearly implies the ergodic hypothesis in equilibrium), in the same sense 
in which the ergodic hypothesis solves the equilibrium case. 

The chaotic hypothesis might turn out to be false in interesting cases: 
like the ergodic hypothesis which does not hold for the simplest systems 
studied in statistical mechanics, like the free gas, the harmonic chain and 
black body radiation. Worse, it is known to be false for trivial reasons in 
some systems in equilibrium (like the hard core gas), simply because the 
Anosov map definition requires smoothness of the evolution and systems 
with collisions are not smooth systems (in the sense that the trajectories 
are not differentiable as functions of the initial data). 

However, interestingly enough, the hard core systems are perhaps the ob- 
ject closest to an Anosov system that can be thought of, being at the same 
time of statistical mechanical relevance, [GG94], [ACG96], to the extent 
that there seem to be no known “physical” properties that this system does 
not share with an Anosov system. Aside from the trivial fact that it is not 
a smooth system, the hard core system behaves, for statistical mechanics 
purposes, as if it was a (mizing) Anosov system. Hence it is the prototype 
system to study in looking for applications of the chaotic hypothesis. 

In fact if the system is smooth one can imagine, and sometimes prove 
rigorously, see [RT98], that phase space contains big nonchaotic islands 
where the system looks very different from a Anosov system: it is important 
to understand the relevance of such regions for the statistical properties of 
(moderately) large systems because the chaotic hypothesis states that they 
should not be too relevant for nonequilibrium theory. 

The problem that “remains” is whether the chaotic hypothesis has any 
power to tell us something about nonequilibrium statistical mechanics. This 
is the real, deep, question for anyone who is willing to consider the hypoth- 
esis. Of course one consequence is the ergodic hypothesis, hence the heat 
theorem, but this is manifestly too little even though it is a very important 
property for a theory with the ambition of being a general extension of the 
theory of equilibrium ensembles. 

A chaotic motion, as discussed in 89.2, is recognized from the expansion 
and contraction that the evolution map x — Sx produces on line elements 
of phase space © (the space of the events). 

However the notion of expansion and contraction depends on the metric 
that we use near x and Sx to measure lengths; hence it is clear that expan- 
sion and contraction at x are not definable in terms of x and of the action 
of S near x alone. The latter are, in fact, local notions, but it will make 
sense to say that a line element 6 emerging from x (lying on ©) “expands” 
if it does so asymptotically, i.e. if: 


|S" 5|gng > Cer” |d|y for alln > 0 (9.4.1) 


where |- |, denotes the length of ô measured with the metric used at the 
point x of © and C, À > 0 are suitable constants. Likewise one has to reason 
in the case of contraction. 


9.4.2 


9.4.3 
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The collection of the expanding line elements generates a plane T¥ (tangent 
to X) called the expanding plane at x and the collection of the contracting 
line elements generates another plane T$. This can be illustrated in a two- 
dimensional case by the first of Fig. 9.1.1 where two line elements emerging 
from a point x € X are directed along the expanding and the contracting 
directions. 


e 


Here the two tangent planes emerging out of x are drawn as two arrows: 
one has to imagine that the planes can be drawn at all points of X; in fact 
the family of planes T$ (or T¥) can be integrated and generates a family of 
smooth manifolds W$ (or W¥) that are tangent at each of their points y 
to the plane Ty (or Ti’). The second drawing in Fig. 9.1.1 illustrates this 
property and shows the beginning of the two manifolds through x, i.e. a part 
of the manifolds that is enclosed in a sphere of radius 6, small compared to 
the curvature of the manifolds. 

More generally we can hope that it will be possible to give a decomposition 
of the tangent plane Ty at a point x as a sum of linearly independent planes 
Ta: 


Fig. 9.1.1 


T LOTZ... TP (9.4.2) 
of dimensions n1, .. . , np and to define p Lyapunov exponents À1,..., Ap such 
that: i 

lim — log |[S"S|gng =à; if OE Ti. (9.4.3) 
n— +0 7 


If such a decomposition and such exponents exist we say that the point x 
admits a dynamical base for the evolution S “towards the future”. 

If 0S" denotes the matrix of the derivatives of S” evaluated at x the 
the spaces T!,...,T? can be taken to be the eigenspaces of the matrix 
limn +00 ((08")*(08"))'/*", if this limit exists: however it should be clear 
that this is not the only way in which the tangent plane can be split so that 
(9.4.3) holds. 

This is so because the *-operation (i.e. transposition) depends on the sys- 
tem of coordinates and on the metric: one sees this by noting that OS? maps 
the tangent space Ty into a different space, namely Tsnz. For instance if 
we have a decomposition satisfying (9.4.2) we get a new one by changing 
T} into a new plane forming a different (positive) angle with respect to 
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the plane of the other vectors T? ©... TP; and one can likewise change 
T2, TS, 

But the sequence of subspaces TP, TP&TP 1 TPS...@T?,T, = TP@...4 
T}, is uniquely determined and metric independent (assuming the existence 
of a dynamical base towards the future). 

Dynamical bases for the evolution towards the past are defined in the same 
way if S is invertible (we do not describe here the obvious adjustments 
needed in the definition, e.g. n  —co). The same comments about lack of 
uniqueness can be made for such bases. 


On the other hand in the case of invertible maps dynamical bases which 
are such both in the future and in the past may exist and, if they do exist, 
they are uniquely determined; we shall call them simply dynamical bases or 
bilateral dynamical bases. 

It is remarkable that such bilateral bases exist under rather general condi- 
tions. For instance: 


(a) if u is a probability distribution which is S-invariant” then, apart from 
a set of points of zero u-probability, every point admits a dynamical base, 
and 


(b) if the distribution y is ergodic in the sense that there are no nontrivial 
measurable functions that are constants of motion then the dimensions of 
the planes TZ, as well as the Lyapunov exponents àj, are æ-independent 
with u-probability 1. 


Remarks: 
(1) Properties in (a) and (b) are the content of Oseledec’s theorem. 


(2) Dynamical bases for the motion towards the future and those for the 
motion towards the past will be different, in general. However if S is invert- 
ible and p is an invariant distribution the forward and backward dynamical 
bases can be chosen to coincide (apart for a set with zero u-probability) and 
their exponents are opposite, see [Ru79a] p. 283. 


(3) A caveat is that in (2) it is essential that u be S-invariant. Therefore 
the above is not saying that all points but a set of zero volume will admit a 
dynamical base, because in general the volume measure po is not invariant. 
But if uo admits a statistics u in the sense of (9.3.1) then all points but 
a set of u-measure 0 admit a dynamical base. In other words it is impor- 
tant to note that in general an invariant probability distribution y is not 
expressible by means of a density function in phase space (one says that it 
is not “absolutely continuous” or “nonsmooth”), and therefore one cannot 
say that the latter statement (b) holds “apart from a set of points with zero 
volume”, even when the volume distribution po admits a statistics p. 


T i.e. the probability u(E) of E and that of SE are equal for all sets E’s. 


9.4.4 
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Considering a dynamical system the question of the existence of the stable 
and unstable manifolds or just of the local stable and unstable manifolds is 
of course rather delicate and one would like to have simple critieria guaran- 
teeing their exixtence and a few basic properties. The real problem is with 
the local manifolds, W°, W2”°, in a sphere of radius 6 around a point x. In 
fact once the local manifolds are defined one can define the global expanding 
and contracting manifolds even when S' is not invertible, simply as 


WY ={U oS WE? if Sha, =a} (9.4.4) 
W = {set of points y such that S”y € wee, for n large enough } 


where the first union is meant, when S is not invertible, also as a union over 
the various possibilities of choosing the x,,’s. 

The existence of the local hyperbolic structure may seem a property dif- 
ficult to check. This is ideed so; but in many cases the proof is greatly 
simplified because of the following sufficient conditions. 

The existence of the local manifolds W$, W”> can be deduced from the 
existence of a continuous family of cones [%,T% lying in the tangent plane 
to every point x with the property that a displacement 6 that points into 
the cone I% will be transformed by the evolution map S inside a cone which 
is strictly less wide than the cone T%,„ and will have a length |S6| which is 
strictly larger than that of 6, and a corresponding property holds for cone 
T”, see Fig60.2 below. 

Here “strictly” means, see Fig60.2, that the length will be larger by a factor 
À > 1 with respect to the initial length and À will be independent of the 
point x; and, thinking that the cone is determined by its intersection with 
the unit sphere in Ty, “strictly less wide” will simply mean that the image 
ST, intersects the unit sphere of Tsx in a set which has a distance r > 0 to 
the boundary of I sr O Tsz, with r being x-independent. 

The first of Fig60.2, illustrates the parts of the pair of cones l'?, TS around x 
(shaded sectors) contained inside a small sphere around x. The evolution S 
maps them into the shaded sectors of the second figure, so that the expand- 
ing cone (marked by u) ends “well inside” the corresponding one around 
Sx (unshaded) while the contracting cone widens around the corresponding 
cone for Sa. 
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S A similar property is required for bis deers, by using S~! instead of 
S. 
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The existence of families of cones is sufficient for the system to be an 
Anosov system, but it is a much simpler condition, conceptually, and in 
fact it can even be easily satisfied in some cases, [Ru79], [Li95]. 


89.5. Symbolic Dynamics and Chaos 


The key property of mixing Anosov systems is that of admitting a 
“Markov’s partition”; this is an important discovery that was realized in 
the classical works of Sinai, [Si68], [Si72], heralded by the independent dis- 
covery of a special case, [AW68]. We devote this section to the illustration 
of this important geometrical notion and to the consequent “symbolic dy- 
namics”, or “coarse grained” representation of motion. 

The geometric properties of Anosov systems allow us to imagine a partition 
of phase space into rectangular cells E1,..., En such that, see Fig. 9.5.1: 


(i) Each cell is defined by a “center” c and two “axes” A, AS: it consists 


of the points z(€,7) which have the form z(€,7) = wee N we? Er x n for 


some € € At and ņ € A’. The boundary OF of such a cell E will, therefore, 
consist of a shrinking part (or “stable part”) 0°F = OA x A$ and of an 
expanding part (or “unstable part”) 0"E af At x OAS. 

The geometrical construction is illustrated in the two-dimensional case in 
Fig. 9.5.1. The circles are a neighborhood of c = x of size 6 very small 
compared to the curvature of the manifolds (so that they look flat); Fig. 
9.5.1a shows the axes; Fig. 9.5.1b shows the x operation and wee, we? 
(the horizontal and vertical segments through 7 and €, respectively, have size 
ô); Fig. 9.5.1c shows the rectangle E with the axes (dotted lines) and with 
the four marked points being the boundaries OA" and OA. The picture 
refers to the two-dimensional case (which is substantially easier to draw and 
to conceive, see [Bo70]), and the stable and unstable manifolds are drawn as 
flat, i.e. the A’s are very small compared to the curvature of the manifolds. 
Transversality of WY, W is pictorially represented by drawing the surfaces 
at 90° angles: 


(a) (b) 


(ii) Furthermore we require a covariance property of the various cells with 
respect to the action of the evolution map S: i.e. we demand that the map 
S transforms the shrinking parts of the boundary of E inside the union of 
the shrinking parts of the various cells of the partition, and S~! also enjoys 
the corresponding property (with the collection of “expanding” sides of the 


IX. Coarse Graining and Nonequilibrium 281 


cells now containing the S~! images of the “expanding” sides of each cell). 
The covariance property is illustrated by Fig. 61.2. 


Fig. 9.5.2 


A partition into such cells is called a Markov partition. Such partitions 
enjoy remarkable properties of covariance under the time evolution and are 
suitable for a description of the motion. 

Note that here “shrinking” and “expanding” surface elements are repre- 
sented in a “literal” sense. However since these are asymptotic notions 
(which only as such are metric independent) it may well be that under the 
action of S a “shrinking side” actually expands in the metric used, or an 
“expanding side” actually contracts: however under repeated applications 
of S such surface elements do eventually behave as the words, and drawings, 
we use suggest. 

Before proceeding it seems useful to discuss briefly an example. It is the 
map of the two-dimensional torus 7 = [0,27]? defined by (x,y) — (x + 


F 
y, £ + 2y) (modulo 27) or & = G 5 (z) (modulo 27). This is the 


case in which Markov’s partitions were first discovered, [AW68]. 

In this case we see easily that the expanding and contracting planes are 
simply the lines through a point (x,y) parallel to the eigenvectors of the 
matrix M = : a The stable and unstable manifolds are, therefore, 
lines through (x, y) parallel to these eigenvectors and regarded as drawn on 
the manifold T. 

Thus they cover it densely because the slope of these lines is irrational 
(being (1 + V5)/2). We see in this example also why it is (in general) 
necessary to distinguish between the local stable and unstable manifolds 
and the global ones (which are dense while the local manifolds are not). 

It is easy, by using ruler and compass, to draw a Markov pavement for the 
above map: an example is given in Fig. 9.5.3. 
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This is a partition for the torus map just introduced: Fig. 9.5.3a represents 
many copies of the torus pavement in Fig. 9.5.3b (unraveled as a pavement 
of the plane). Each copy is a square with side 27. The two lines are parallel 
to the eigenvectors of the above matrix M generating the map. The line 
with positive slope is parallel to the eigenvector with eigenvalue larger than 
1, and the two lines are orthogonal because the matrix is symmetric. 

In this simple case the expanding and contracting directions are trivially 
parallel to the two drawn lines. If we draw the lines of Fig. 9.5.3a as 
they really appear when wrapped back into the torus we get Fig. 9.5.3b 
where the endpoints of the lines are marked as big dots. The lines define 
several rectangles (both in the geometrical sense and in the above introduced 
dynamical sense, which explains the attribute “rectangular” used) with a 
few exceptions (three in the picture). They are due to the termination of 
the initial lines “in the middle of nowhere”. The dashed lines continue the 
original lines (on both ends) until they meet a line, thus completing the 
missing rectangles. The result depends on the order of the continuation 
operations: an irrelevant ambiguity (as in any event Markov’s partitions 
are by no means unique®, when they exist). 

Fig. 9.5.3b could have been obtained directly without any elongation of the 
lines had the initial lines been drawn of appropriate size. The construction 
shows how to find the appropriate size (using only “ruler and the compass” 
as required by every noble drawing, or by a Postscript program using only 
“integers and quadratic irrationals” ). 

The union of the rectangle boundaries parallel to the line with negative 
(positive) slope is transformed into itself by the action of the map (inverse 
map). This is so because it is a connected piece of the stable manifold of the 
trivial fixed point that is the origin; it shows that the property of Fig. 9.5.2 
is indeed satisfied. Hence Fig. 9.5.3b is a simple example of a “Markovian 
pavement” also called a “Markov partition” (and its discovery was at the 
beginning of the developments discussed here, [AW68], [Si68]). 

In fact given a Markov partition we can generate a much finer partition 
simply by transforming it with the various iterates of the map and then 
intersecting the collection of pavements thus obtained, see footnote 8. The 
new partition of phase space is obviously still Markovian but it can be made 


8 For instance if P = {E;} is a Markov partition then P’ = {E; N SE;}, P” = {971E N 
Ej} and P = {S~1E; N Ej N SEm} (and so on) are Markov partitions. 
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as fine as we want. 


The above definitions and related properties can be extended to systems 
which have an attracting set consisting of a smooth surface and the time 
evolution restricted to the surface becomes a mixing Anosov system. The 
extension can even be pushed to systems with attracting sets that are not 
smooth but that are still hyperbolic (“axiom A attractors”, [Ru76]). We 
shall not deal here with this more general notion (for simplicity). 

Anosov systems that are mixing are topologically mixing in the sense that 
given any pair of open sets U,V there is a ny such that the iterates SU 
have a nonempty intersection with V for all n > ny. Hence imposing 
mixing is the simple way of imposing the condition that the system phase 
space cannot be “disconnected” into disjoint parts under the action of S or 
of any of its iterates (alternative to assuming existence of a dense orbit). 

A general theorem (“Smale’s spectral theorem”) deals with cases in which 
one does not assume mixing: under the assumptions (a) and (b) in the 
definition of §9.4 and adding the extra assumption: 


(c) Periodic motions are dense on phase space. 


the phase space M can be decomposed into a union of Mı U M U... U 
M, in each of which there is a dense orbit. If n = 1 the system is said 
to be transitive. Hence each system satisfying assumptions (a), (b), (c) 
can be regarded as a collection of finitely many transitive Anosov systems. 
Furthermore if a Anosov system is transitive then its phase space M can be 
decomposed as a union M = MjUM$U...U M}, on each of which S$” acts 
as a mixing Anosov system. 

In other words given an Anosov system with dense periodic points either 
topological mixing holds for the map S” for some n, or phase space splits 
into a finite number of disjoint closed components (called “spectral ele- 
ments” of the system) in each of which, for some large enough n, the map 
S” is topologically mixing. The mixing assumption is not as strong as it 
may at first appear: if it does not hold it is because in some sense we 
have chosen the phase space inappropriately not noticing that motion was 
actually taking place on a smaller space. 


Markov partitions set up a nice “coarse graining’ permitting us to think 
of the dynamical system as a copy of something very familiar in statistical 
mechanics: namely the one-dimensional Ising model, or of one of its exten- 
sions considered in 85.10. The correspondence is via the symbolic dynamics 
associated with Markov partitions. 

One defines for each point x a sequence a(x) of digits each of which can 
take W values if W is the number of elements of a Markov partition € with 
elements so small that the image of every rectangle intersects all the other 
rectangles at most in a connected part (i.e. the size of the rectangles is 
so small that even when stretched by the one time step evolution map, it 
remains small compared to the curvature of the sides). 
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One defines a N xN transition matrix Tag by setting Tag = 1 if the interior 
of the rectangle Ea evolves under S' into a set intersecting the interior of 
the rectangle Eg; we set Tag = 0 otherwise. 

The definitions readily imply a few properties. A transition matrix is said 
to be transitive if for any pair ø, øo” there is a power n such that (T”)s,o > 0; 
this means that there is a sequence which is compatible and which contains 
the symbols ø and, to the right of it, o’. A transition matrix is said to 
be mixing if for all n large enough (T°), > 0. A Markov partition for a 
transitive Anosov system has a transitive matrix which is also mixing if the 
system is mixing. 

A sequence {04} is “allowed” or “compatible”, see 85.10, if it is a sequence of 
symbols such that Toop, = 1 for all integers k € (—oo, 00). A meditation 
on Fig. 9.5.3, i.e. on the covariance property, will convince the reader that 
if ¢ = {o;} is allowed there must be a point x such that Sr € Es, for all 
integers k € (—oo, 00); and this point must be unique by the hyperbolicity of 
the transformation (if there were two such points they would travel visiting 
always the same boxes of €: which is impossible because if two points 
visit the same boxes during the time [—T,T] then, from the definitions of 
hyperbolicity, their distance must be not bigger than O(Ce~*), cf. (9.4.1)). 

Therefore it follows that we can establish a correspondence between points 
and compatible sequences. There may be exceptionally more than one se- 
quence representing the same point, but this can happen only if the point 
is either on a boundary of a rectangle of € or on that of one of its images 
under iterates of S. Hence the set of points that are represented by more 
than one sequence has zero volume and, therefore, it can be ignored for 
the purposes of our discussion (which in any event disregards sets of zero 
volume). 

This means that we can map, or “code”, phase space into a space of se- 
quences: any function on phase space becomes a function of the sequences. 
The coding of points into sequences of digits is very similar to the familiar 
coding of its coordinates into decimal sequences (which is also well defined 
apart from a zero volume (dense) set of exceptional points, namely the 
points whose coordinates are numbers ending with an infinite string of 0’s 
or of 9’s). And it is harder but much better in spite of the fact that the dec- 
imal representation is the “usual” representation of points in phase space, 
both in theoretical applications and in numerical experiments. 

It is better because it is adapted to the dynamics and turns the most 
chaotic dynamics into a “standard” one (still chaotic), namely the shift 
on a space of sequences of symbols subject to a nearest neighbor constraint 
(that Ty,.0,,, = 1), also called a hard core, see 85.10. The latter dynamical 
systems are often called “subshifts of finite type”, or “one-dimensional spin 
(or particle) chains” for obvious reasons. 

A function F(x) which is mildly regular on phase space, e.g. Hélder con- 
tinuous with exponent a, becomes a function F'(a(a)) of & which has a weak 
dependence on the digits ox of o with large |k|: i.e. if & and g’ agree on 
the digits between —k and k then the distance between x(a) and x(q’) is 


9.5.1 
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< O(Ce~**) so that F(z) — F(a’) is bounded by a constant times e~***, 

In particular the expansion rate A,,(x) of the phase space volume over one 
time step of the map S (i.e. the determinant of the Jacobian 0S of the map 
S, also called the contraction rate of phase space because A,,(x) is a “real” 
contraction or a “real” expansion depending on whether it is < 1 or > 1) 


will satisfy for some C, a: 
|Au(x(z)) — Au(2(2’))| < Cerf? (9.5.1) 


if the digits of g and g’ with labels between —k and k agree. This is because 
hyperbolicity implies that the stable and unstable tangent planes at x are 
very smooth and vary in a Holder continuous fashion with the point x (as 
mentioned in §9.3, this is Anosov’s theorem), so that also the functions 
Au(x), As(a) are Holder continuous. 

One can interpret (9.5.1) as saying that the function Alel) E eò2) 
has short range as a function of the symbolic sequence o and the following 
remarks are worth the effort necessary for understanding their formulation, 
admittedly hard at first sight (yielding to a reassuring sense of triviality 
after some thought): 


(1) The set of points which are symbolically represented by sequences that 
agree between -4T and iT is just the set of points in the very small rect- 
L y 
angle of €_ 1T,LT consisting of the intersections A a ~ Es, (with sides 
: -7 
bounded proportionally to e737), see footnote 8 above. Hence a sum over 
the elements E € €_ inir can be written as a sum over the sequences 


(o_x,...,0x) (which are compatible). The reader will be greatly helped by 
1 
attempting to draw a representation of a set in Has 7I Es, in the manner 
2 
of Fig. 9.5.2 above). 


t 
; iT , : : ibe et 
(2) A point z € E = N? ,,,S-7 Eg, is determined by a compatible bi-infinite 
2 
sequence g which continues O_ig,---,017 On either side to an infinite (com- 
patible) sequence ... CAP 1,9 AT CLP ITH 


(3) Since there are only finitely many values for each symbol we can define 
for each symbol ø compatible sequences a (a) = (0/,04,...) and al (a) = 
(...,0o 2,01) infinite to the right and to the left, respectively, and such 
that Too = 1 and To" 1.0 = 1, i.e. respectively right and left compatible 
with o: this can be done in many ways. We shall call a pair of functions 
o — (a®(c), a’(c)) a boundary condition. 

If we are given a boundary condition (a? (øo), a/(o)) we can associate with 
each element E € sured Es, a point c(E) whose (bi-infinite) symbolic 


“center” 


sequence is o(9_1r),0_ir,---,017,a"(o17). We call c(E) the 
of E with respect to the boundary condition o — (a(o), al (a)). 


The bi-infinite continuation, which we can naturally call “Markovian” , of the 


9.5.2 
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finite string o_1r,... OT to an infinite compatible sequence is possible 
because of the topological transitivity property supposed for the system 
(consequence of the mixing).® 


(4) We can interpret a continuation of the sequence o_ LT) +++) AT as the 
assignment of a boundary condition to a spin configuration in the one- 
dimensional box [— 4T, 4T] in the sense of the discussion in §5.10, following 
(5.10.16). 


(5) If x is a point in E = nE A gI Í Es; the product of expansion factors 


) 
7 
[ee Au(S4x)~+ can be written 


T 
Z 7 Md*a) 
=F (9.5.2) 


with Aala) = log Aq(a(c)), a = u, s. 


(6) One says that the mixing Anosov system (M, S) admits a symmetry if 

there is an isometry of phase space with J? = 1 which is either commuting 
or anticommuting with S: i.e. IS = SI or IS = S~1I, respectively. In 
the second case the symmetry is called a time reversal symmetry: of course 
the same system may admit several time reversal symmetries. A Markov 
pavement for (M, S) is said to be “IJ-symmetric” if E € P implies that 
IE € P; in this case, again by the transitivity of the compatibility matrix, 
we can define time reversible boundary conditions o — (a®(c),a*(c)) in 
such a way that the centers c(E) of the sets E = af gs Es, satisfy the 
covariance property: Ic(£) = c(lE). 
In the commuting case this is so because we can choose the continuation 
of ox, to the right and that of io, to the right to be “consistent” i.e. if 
Ok+1:0k+2,-.. continues gk then i0x+1,i0x+2,... continues igp. In the an- 
ticommuting case the continuation of ø to the right has to be chosen con- 
sistent in the same sense with that of io to the left. Such consistent choices 
are Markovian in the above sense. 


(7) another interesting consequence of mixing is that given a sequence 
O_s,...,0, and supposing that p is such that T?,, > 0 for all 6,0’ (which 


9 Given a symbol F, let 7 be such that (T”)ş, > 0. Let no be such that 


Ti: 
(T"°), > > 0, this means that there is a sequence F, o3, ...,0 ,,0 which 


is compatible, and also a sequence 062...6n, 0 which is compatible: therefore 


Ca 


def. ~ ae i ; ane 
a(o) er Õ2, Õno, T, 09-..0 --. Where the last dots indicate indefinite repetition 


of the string F, 03,.--, on is a infinite string “continuing” the symbol ø to the right 


—1 
into a compatible sequence. Likewise one builds a g4 (ø) continuing th symbol ø to the 
left into a compatible sequence. The name “Markovian” is due to the property that the 


sequences g% (o), a? (o) share: namely they depend solely on ø. 
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is an analytic form of the mixing property) we can form a compatible se- 
quence 0_s,.-.,0s,0s41,+++,0s+p = C-s: therefore we can form an infinite 
sequence in which the latter sequence is repeated indefinitely into an infinite 
and compatible sequence. The point x that corresponds to this sequence is, 
necessarily, a periodic point (with period s + p). A consequence is the den- 
sity of periodic points in phase space: every mixing Anosov system admits 
a dense set of periodic orbits. 


Because of all the above properties Anosov systems (and the related Axiom 
A attractors) are the paradigm of chaotically behaving systems. The basic 
idea, developed in the early 1970s by D. Ruelle, is that such systems are not 
just curiosities but rather they are in some sense the “rule” when dealing 
with real dynamics. 


89.6. Statistics of Chaotic Attractors. SRB Distributions 


Perhaps the most important property of mixing Anosov systems or systems 
with Axiom A attractors is that they admit stationary states, i.e. the limits 
in (9.3.1) exist and the “statistics” u of almost all data x (i.e. all data 
outside a zero volume set) exists. Furthermore the probability distribution 
u, called the SRB-distribution, describing them can be characterized quite 
explicitly. 


Note that the “almost all” is an essential feature of the definition, in fact, 
see §9.4, Anosov (or Axiom A) systems will have a dense set of periodic 
points covering phase space (or the attracting set): any such point x, at 
least, will of course be an exceptional point as far as the value of the limit 
in (9.8.1) is concerned. 


The SRB distribution can be given an expression in terms of the kinemat- 
ical properties discussed in the previous section. This is an expression that 
can play a role similar to that played by the Boltzmann-Gibbs expression 
for the equilibrium distributions. It is an expression that clearly cannot 
be computed in any nontrivial case, much like the integrals that express 
equilibrium properties in terms of integrals with respect to the canonical 
distribution. 

However, like the integrals with respect to the canonical distribution, it 
can be useful to derive relations that must hold between various averages. 
Therefore formal expressibility of the SRB distribution seems to be a very 
important property for nonequilibrium theory. 

The formula can be rather easily justified at an informal level, however even 
this requires good will on the part of the reader, to the extent that he will 
develop it only if convinced of its utility. Therefore we relegate to Appendix 
9.A2 below the “informal” analysis and we confine ourselves to giving here 
the SRB distribution expression in a form sufficient for the discussion of a 
few applications. 

One needs to define the SRB average of a generic observable F(x) on phase 


9.6.1 


9.6.2 


9.6.3 


9.6.4 


9.6.5 
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space: = JF . For this purpose we divide phase space into cells 
at ee so oi Mees ie ‘ is constant in each of them. 

The division into cells will be conveniently made by using a Markov par- 
tition € (it will not matter which one); of course the elements E € € will 
not, in general, be so small that F is constant in each of them. However we 
can refine € simply by considering the partition €_rr = NS JE; as 
remarked in §9.4 this is a much finer partition (and the size of the cells is 
of the order O(e~>*) if À is defined in (9.4.1)). 

Each element of € has the form NF _rS Es, where o_7,...,07 is a 
compatible sequence of symbols. 

Given a boundary condition g’,¢” in the sense of 89.5, which one again 
does not matter, we can define a compatible bi-infinite sequence: 


TE = (or (o-r), OT;,..., or, a" (or) (9.6.1) 


for each E € Er. 

If c(E) is the point of Æ whose symbolic representation relative to the 
partition € is a), we can define the expansion rate Ay,27(E) of the map S?T 
regarded as a map between S~7c(E) and STc(E) as, cf. (9.5.2), 


T-1 
Auor(E) = Auor( ye Il Au (E)) = II edu (S7 *c(E)) 

k=-T k=-T 

(9.6.2) 

Then the SRB distribution 4 can be written as 
Am (C(E))F (c(E 
J Foma) = PE ireen r Auzan CPE) (9.6.3) 
DE X Bee rr A 2n(T) (c(E)) 


where n(T) < T is any sequence tending to oo as T — oo. 
A particularly convenient choice will be n(T) = T so that, with c = c(E), 


AS c)F (ce 
J Font = lim nes nr Suz) (9.6.4) 


A 
T> J pee rr yar) 


Remarks: 


(1) The weight in (9.6.4) can be written, in terms of the function A(@) 
defined in 89.5, see (9.5.2), and of the shift operation Ÿ on the infinite 
sequences, as 


n(T ig 
Ahen (E) = € Dimmer NOD (9.6.5) 


where g = gp is the sequence that is obtained by continuing o_7,...,or 
to an infinite sequence as prescribed by the chosen boundary condition, 
see (9.6.1). This is a slightly different rewriting of (9.6.2). It shows that 
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the SRB distribution can be interpreted as a probability distribution on the 
space of bi-infinite compatible sequences g. As such it is a Gibbs distribution 
for a one-dimensional lattice spin model with short-range interaction and 
nearest neighbor hard core (due to the compatibility restriction that limits 


the allowed configurations a). In fact (9.5.1) says that Sven X( 0) 
can be interpreted as the energy of a spin configuration under a potential 
that is exponentially decreasing at co, see 85.10, (5.10.14), (5.10.18). 

In 85.8 we have seen that one-dimensional short-range systems are quite 
trivial from the statistical mechanics viewpoint and the theory of chaotic 
motions inherits quite a few results from the theory of such lattice sys- 
tems. Such results are often quite nontrivial and surprising when seen as 
properties of chaotic motions. For instance uniqueness of SRB distributions 
corresponds to the absence of phase transitions in one-dimensional lattice 
systems. The thermodynamic limit corresponds to the limit as T — oo. Ex- 
ponential decay of time correlations in the SRB distributions corresponds 
to the exponential decay of correlations in one-dimensional short-range lat- 
tice systems. Large deviation theorems correspond to the analyticity of the 
thermodynamic functions and so on. 

For the above reasons the theory of Anosov (and Axiom A) systems has 
been called thermodynamic formalism, [Ru78b]. 


(2) If n(T)/T > 0 then by (9.6.5) and (9.5.1) we see that the weight 
AT n(T) (E) given to the cell E of Er does not change appreciably as the 
boundary condition is changed.?° This is a kind of “mean value theorem”, 


for the SRB distribution. 


(3) But if n(T) = T the variation of Ay or(2) within E is appreciable 
because, clearly, the sum in (9.5.2) will undergo variations of order O(1), 
when the boundary condition is changed. 


(4) Hence (9.6.4) is a deeper property than (9.6.3) with n(T)/T — 0. It 
is proved easily in the thermodynamic formalism because it reduces to the 
statement that one-dimensional lattice gases with short-range interactions 
show no phase transitions, therefore the boundary condition dependence 
of the averages of local observables disappears in the thermodynamic limit 
T — oo), see 85.8. 


(5) If the attractor is invariant under the action of a time reversal symmetry 
I, see 89.5, we can, and shall, suppose that the Markov partition is 1- 
reversible: if E € € then IE € €.11 Furthermore the centers c in (9.6.2) 
can be chosen so that if c is the center of E then ic is that of IE, see 89.5, 
comment (6). This means that we have to choose a reversible boundary 


10 Note that at fixed E and as the boundary condition is varied the center point c(E) 
varies (densely) inside E. 


11 If not one could use the finer partition obtained by intersecting E and IE so that the 
new partition will be time reversible. 
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condition to select the centers. 


An expression of the SRB distribution based on a reversible partition € and 
on a reversible boundary condition will be called a reversible representation 
of a SRB distribution.. 


In Appendix 9.A2 below we give details of an informal derivation of (9.6.3), 
but the above properties will suffice for deducing many interesting con- 
sequences of the chaotic hypothesis, in the form of general properties of 
Anosov systems. 

It is interesting and important to discuss the connection between the Boltz- 
manian representation of motion as a cyclic permutation of the phase space 
cells and the new symbolic representation of motion in Anosov systems. In 
fact the chaotic hypothesis is supposed to hold also in the equilibrium cases 
and therefore one has two different representations of motions for the same 
system. 

According to views already repeatedly expressed by Boltzmann, [Bo74], 
this “dualism” is not a priori impossible although in most cases in which 
he envisaged this possibility it seems that he did not really believe in it. 
The above seems to be a very fine and nontrivial instance in which a dual 
representation is possible. This is discussed in Appendix 9.A1 below, see 
also [Ga95a|. 


89.7. Entropy Generation. Time Reversibility and Fluctuation 
Theorem. Experimental Tests of the Chaotic Hypothesis 


The connection between the general kinematical analysis of chaotic motions 
and applications can be established if one accepts that the motions of a 
many-particle system are so “chaotic” that one can regard the system as a 
mixing Anosov system in the sense of §9.4. 

One of the key notions in equilibrium statistical mechanics is that of en- 
tropy; its extension to nonequilibrium is surprisingly difficult, assuming that 
it really can be extended. In fact we expect that, in a system that reaches 
under forcing a stationary state, entropy is produced at a constant rate so 
that there is no way of defining an entropy value for the system, except 
perhaps by saying that its entropy is —oo. 

Although one should keep in mind that there is no universally accepted 
notion of entropy in systems out of equilibrium, even when in a stationary 
state, we shall take the attitude that in a stationary state only the entropy 
creation rate is defined: the system entropy decreases indefinitely, but at a 
constant rate.12 Note that we say “decreases” and not “increases” because 
in a nonequilibrium situation nonconservative forces work upon the system 
and, since the system is supposed to be in a stationary sate, such work must 
be ceded to the exterior in the form of heat at constant temperature. So 


12 Defining “entropy” and “entropy production” should be considered an open problem. 
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entropy (of the system) decreases and entropy of the surroundings increases 

(as expected). 

The natural definition of entropy creation rate is, following Gibbs and 
Boltzmann, that of the time derivative of the entropy of a state of the 
system that evolves towards stationarity, [An82], [Ru97c]. 

We look, temporarily, again at the evolution of our systems in continuous 
time: so that we imagine that it corresponds to a differential equation 
t = f(x). A “state” which at time t = 0 is described by a distribution Lo 
with density po(x) with respect to the volume element dx on phase space 
becomes at time t a distribution (dx) = p(x)dx with density p(x) = 


d ; 7 
Po(S_42) ZEN where Ji(x) = Eur is the Jacobian determinant at x of 
the map x — S_,x. Defining the “entropy” of u, as 


E(t) - J pa(æ) log pe(e) de = 


OS 1x OS _4x 
= f po(Sux) A log (0Sa) Əz ) dx 


we deduce that 


EO =- f ns 


_ J E OR Pg OSE ys 
Ox Ox 


and the first term on the ae hand side does not contribute to € because 
it equals the constant — f po(y) log po(y)dy (just set S_4x = y); therefore 
Ê equals the t derivative of the second term, which can be transformed by 
setting y = S_4x into 


(9.7.1) 


log po(S_4x) dx— 
(9.7.2) 


OS_+(S OS. 
-f Haaie ee ty) =i dy poly) log D (9.7.3) 


having used the identity 
OS_+(Sty) OSty 


=1. 9.7.4 
OSry Oy ( ) 
We now make use of the other identity 
d OSiy — OSty 
ka — .7.5 
dt Oy dy o(Siy) te) 


where a(x) is the divergence of — f(x) (writing the equations of motion as 
f(x)). It follows that the rate of entropy creation is, see [An82], E: 


E=- J oS) dy = 


(9.7.6) 


9.7.7 
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if 4 suitably converges to p. 


Hence we see that if y qy 1 the asymptotic average rate of entropy 
de 


variation is o4 a (a), Since the macroscopic state of the system does not 
change as t — oo we must interpret o} as the average entropy increase of 
the thermostat that absorbs the heat created in the system by the forcing. 

Naturally we must expect o, > 0. It is reassuring that this a theorem, 
[Ru96a], that o, > 0 for systems with an Axiom A attractor, hence for sys- 
tems satisfying the stronger chaotic hypothesis. This is not very surprising 
because, by assumption, our system evolves in a bounded region (i.e. phase 
space is bounded) so that o, < 0 would mean that the volume expands 
indefinitely, which is impossible. 

Because of the above considerations we shall call o the entropy generation 
rate and we suppose that it has the property o(x) = 0 if the system is 
not subject to forcing, so that at zero forcing the evolution is volume pre- 
serving (a property usually true because the nonforced system is, as a rule, 
Hamiltonian). 

Coming back to our previous point of view, with time evolution described 
by a map S on a phase space of “timing events” the entropy creation rate 
will be, in this case, identified with the phase space contraction between 
one timing event and the next: 


l. (9.7.7) 


Our analysis concerns idealized systems of the above type that are also 
mixing Anosov maps in the sense of 89.4. 

We now attempt to deduce other consequences of the chaotic hypothesis, 
possibly new (and in any event beyond the existence of the stationary dis- 
tribution u and the nonnegativity of o+) and measurable in at least some 
simple cases. The simplest cases to study are systems whose dynamics is 
reversible not only in the nonforced case, but under forcing as well. 

Examples of thermostatting mechanisms that generate reversible motions 
are provided under rather general circumstances by forces acting on oth- 
erwise Hamiltonian systems and realizing an anholonomic constraint ac- 
cording to the principle of least constraint of Gauss, also called minimal 
constraint principlesee Appendix 9.A4. In the following we shall provide 
some simple concrete examples, but it is important to note that the theory 
is far more general than the few examples that we shall discuss. 

We consider, therefore, a general reversible mechanical system governed by 
a smooth equation: 

t= f(x, G) (9.7.8) 


depending on several parameters G = (G1, ..., Gn) measuring the strength 
of the forces acting on the system and causing the evolution x — Sx of 
the phase space point x representing the system state in the phase space F 
which can be, quite generally, a smooth manifold. 


9.7.9 


9.7.10 


9.7.11 


9.7.12 


IX. Coarse Graining and Nonequilibrium 293 


We suppose that the system is “thermostated” so that motions take place 
on bounded smooth invariant surfaces H(x; G) = E, which are level sur- 
faces of some “level function” H. Hence we shall identify, to simplify the 
notation, the phase space F with this level surface which we shall sometimes 
call, somewhat inappropriately, the “energy surface”. 

We suppose also that the flow S; generated by (9.7.8) is reversible, i.e. 
there is a volume preserving smooth map J, “time reversal”, of phase space 
such that 1? = 1 and “anticommuting with time”: 


SI =1S_, (9.7.9) 


t.e. fUs, G) T — (z1) (x) i f(z, G) 

We shall further restrict our attention to mixing Anosov systems that are 
reversible, in the above sense, for all values of the forcing parameters G of 
interest and dissipative at G #0. This means that the systems we consider 
are such that, see (9.7.6): 


o= (0o), >0 for G #0 (9.7.10) 


Under the above assumptions one can define, for (a), > 0, the “dimension- 
less average entropy creation rate” p by setting: 


1 1 T/2 
p= = I o(Sı; G)dt (9.7.11) 
(o) T —T/2 


Then the probability distribution of the variable p with respect to the SRB 
distribution u can be written for large r as 7-(p)dp = const e-T$r(P)dp, see 
[Si77], and the function ¢(p) = lim. r (p) satisfies, if o+ =(0), > 0 and 
|p| < p* for a suitable p* > 1, the property: 


¢(—p) = ¢(p) + pos, lp| < p* (9.7.12) 


which is called the fluctuation theorem, and is part of a class of theorems 
proved in [GC95], see also [Ga95a], for discrete time systems, and in [Ge98], 
for continuous time systems. This theorem can be considerably extended, as 
discussed in [G96b], [Ga98b] and the extension can be shown to imply, in the 
limit Œ — 0 (when also ø+ — 0) relations that can be identified in various 
cases with Green-Kubo’s formulae and Onsager’s reciprocal relations, see 
also [GR97], [Ga98d] and §9.9 below. 

Similar theorems can be proved for suitable nonstationary probability dis- 
tributions and, in fact, preceded the above, [ES94], or for nondeterministic 
evolutions, [Ku97], [LS98]. In the closest cases the relations between the 


13 For instance if —o(x; G) is the rate of change of the volume element of F near x and if 
F is a Euclidean space then o(x) = -> Oa falx; G). If x = (p,q), and Ix = (—p,q) 
and f is an Hamiltonian part plus a p-dependent term due to the “thermostat forces” 
then o(1x,@) = —o(2,G). 
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latter theorems and the above is sometimes “only” an interchange of limits: 
it is precisely in the analysis of this interchange that the chaotic hypothesis 
plays a major role, see [CG99]. 

The interest of (9.7.12) is its universal nature, i.e. its (reversible) system 
independence, and the fact that it contains no free parameter. 

The connection with applications of the above results is made via the as- 
sumption that concrete chaotic dynamical systems can be considered, “for 
the purpose of studying macroscopic properties”, as mixing Anosov flows. 

The fluctuation theorem proof is quite simple and it is based on the ex- 
pression (9.6.4): it will be discussed in the forthcoming sections. 

As a concluding comment we note that the probability distribution of p 
can be regarded as the probability distribution of the sum of the “local 
Lyapunov exponents”; if one defines it as the sum of the eigenvalues A, (x) 
of the matrix 0S7(S~72): rp{o), = — >; A; (£). 

If a system is an Anosov system then it has been proved that the proba- 
bility distribution of the sum port of the local Lyapunov exponents has a 
density of the form e$)7, [Si77]. One says that the distribution is multi- 
fractal if ¢(p) is not linear. This means that the sum of the local Lyapunov 
exponents has wide fluctuations around its average (given by ~ ro+). The 
fluctuation theorem says that the odd part of the multifractal distribution 
is always linear, in reversible systems, so that multifractality of the vol- 
ume contraction rate is, in such systems, related to the even part of its 
distribution. 


Is the above (9.7.12) an observable relation? In fact it was observed in a 
numerical experiment with 56 particles modeling a (reversible) gas in a shear 
flow, [ECM93], and the attempt at theoretical prediction of the observed 
results led to the chaotic hypothesis and to the derivation discussed in 89.9. 

It has then be observed in a sequence of experiments with 2 and 10 hard 
core particles, [BGG97], moving among fixed obstacles in a periodic box 
and subject to a constant field and thermostatted with a force necessary 
to maintain a constant total kinetic energy in spite of the action of the 
field. The force is selected among the several possible force laws as the 
one satisfying Gauss’ principle of minimal constraint (so that the resulting 
equations are reversible, see Appendix 9.A4). 

One can also consider systems in which the forcing has a “thermal nature” 
like systems enclosed in boxes whose walls are kept at constant temperature 
(depending however on which side of the walls one considers). Also such 
systems can be modeled with equations of motion which can be reversible, 
for instance see [Ga96b]. A very interesting numerical experiment has been 
performed on a chain of oscillators (in number up to 104) interacting with 
inelastic forces and with the oscillators at the extremes forced to have a 
“given temperature” by acting on them with suitable forces, [LLP97]. 

An experiment of a completely different kind, on a sample of water in 
convective chaotic motion (not too strongly chaotic, however) has been per- 
formed recently. Its interpretation in terms of the fluctuation theorem (or 
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rather of its extensions discussed in [Ga96c], [Ga97a], because a fluid is 
strongly dissipative and one cannot expect that the attractor is dense in 
phase space, even in developed turbulence states) is still under analysis, 

[CL98]. 

An important prediction of the fluctuation theorem in strongly chaotic 
particle systems is that the slope of the graph of ¢(p) — ¢(—p) is precisely 
o+. This shows that even if the distribution of p was Gaussian, i.e. ¢(p) = 
sp(1—p)? for some D > 0, the theorem would be nontrivial. In fact it 
would be ¢(p) — ¢(—p) = $p and there would be no a priori reason to have 
4 — O+. 

In general one expects the distribution ¢(p) to be Gaussian near the average 
value of p (which is 1 by definition), [Si77], but the Gaussian approximation 
should be correct only for |p — 1| = O(r~!/?) (“central limit theorem”). 
Hence it becomes important to test not only the linearity of ¢(p) — ¢(—p) 
but also the slope of this linear law and whether the distribution ¢(p) can 
be regarded as Gaussian. 

One finds in the first two experiments the correct value of the slope but one 
cannot really distinguish whether the distribution of p is Gaussian or not. 
Although one can see a priori that it is not Gaussian, the non-Gaussian 
nature of the distribution is not observable because it manifests itself in 
a region so far away from p = 1 that the corresponding huge fluctuations 
cannot be observed, being too rare. The attempt at understanding the rela- 
tion between the central limit theorem and the large deviation experimental 
results on the fluctuation theorem led to the idea that there was a relation 
between the fluctuation theorem and the linear response theory of Onsager 
and Green-Kubo. This in fact was found in [Ga96a],[Ga96b],|Ga98d]. 

In the third experiment, [LLP97], one finds the correct value of the slope 
in a situation in which ¢(p) is manifestly not Gaussian. Hence this is a key 
experiment for the theory. 

Finally in the fourth experiment, [CL98], one gets a linear graph for the odd 
part of the large deviation function ¢(p), but the slope is not that of (9.7.12) 
but considerably smaller. The system in this case, unlike the previous one, is 
certainly so dissipative that the attractor is much smaller than phase space 
(the space of the temperature and velocity fields of the sample of water) 
and the slope was certainly not expected to be o+, [BGG97], [BG97]. This 
might be due to the fact that the system is not reversible, or that it is not 
equivalent to a reversible one, or that the chaotic hypothesis is incorrect 
in this case. But the matter requires further investigation, because in the 
earlier work [Ga96c], [Ga97a] it was shown that in such cases one could 
expect a slope P < 1. 

A final comment on the observability of the fluctuation theorem in large 
systems: since the function ¢(p) is expected to be proportional to the volume 
of the system, or at least to the surface of its container (depending on the size 
of the region where dissipation really occurs), it is impossible to observe the 
fluctuation relation in macroscopic systems because the fluctuations have 
too small a probability. However in some cases it is possible to derive a 
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“local fluctuation theorem” which concerns the fluctuations of the entropy 
creation rate in a microscopic region. In such cases the fluctuations are 
observable. One is in a situation similar to that of density fluctuations 
in equilibrium. One cannot see density fluctuations of a gas in a large 
macroscopic container, but one can quite easily see density fluctuations in a 
small microscopic volume, and the functions ¢ that control such deviations 
are simply proportional (their ratio being the ratio of the corresponding 
volumes). It would be interesting to formulate local fluctuation theorems 
as generally as possible, beyond the few examples known, [Ga98e]. 


89.8. Fluctuation Patterns 


It is natural to inquire whether there are more direct and physical inter- 
pretations of the theorem (hence of the meaning of the chaotic hypothesis) 
when the external forcing is really different from the value 0. A result in 
this direction is the conditional reversibility theorem, discussed below. 

Consider an observable F which, for simplicity, has a well-defined time 
reversal parity: F(Ix) = ep F(x), with ep = +1. For simplicity suppose 
that its time average (i.e. its SRB average) vanishes, F} = 0, and let 
t — y(t) be a smooth function vanishing for |t| large enough. We look 
at the probability, relative to the SRB distribution (i.e. in the “natural 
stationary state”) that F(S+x) is close to y(t) for t € [-5, 3]. We say that 
F “follows the fluctuation pattern” y in the time interval t € [—4, 7]. 

No assumption on the fluctuation size (i.e. on the size of 4), nor on the 
size of the forces keeping the system out of equilibrium, will be made. Be- 
sides the chaotic hypothesis we assume, however, that the evolution is time 
reversible also out of equilibrium and that the phase space contraction rate 
o+ is not zero (the results hold no matter how small 6} is; and they make 
sense even if o+ = 0, but they become trivial). 

We denote by ¢(p,y) the large deviation function for observing in the 


; def 
] an average contraction of phase space o, = + 


Tr CT: 
PIER 
en o(Six)dt = po, and at the same time a fluctuation pattern F (Sx) = 
p(t). 

This means that the probability that the dimensionless average entropy 
creation rate p is in an interval A = (a,b) and, at the same time, F is in a 
neighborhood!4 Uy ņ of y, is given by 


time interval [ 


sup e77- (P?) (9.8.1) 
pEA, pE Un 
to leading order as T — oo (i.e. the logarithm of the mentioned probability 
divided by 7 converges as T — 00 to sUPpea yeu, 6(P:¥)): 


14 By “neighborhood” Uy „ we mean that f Y(t)F(Six)dt is approximated within 


—T/2 
given 7 > 0 by pe #(t)p(t)dt for ÿ in the finite collection Ÿ = (Y1,...,#Ÿm) of test 


functions. This is, essentially, what is called in mathematics a “weak neighborhood”. 
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Given a reversible, dissipative, mixing Anosov flow the fluctuation pattern 
t — y(t) and the time reversed pattern t > ery(—t) are then related by 
the following: 


Conditional reversibility theorem: Consider a function t — y(t) and an 
observable F with defined time reversal parity ep = +1. Let r be large 
and consider the fluctuation pattern {y(t)}ie;_z,z] and its time reversal 


2°2 


{Holt }te(-z,2] = {ere(—t) }re[-z,2]; they will be followed with equal like- 
lihood if the first is conditioned to an entropy creation rate p and the second 
to the opposite —p. This is an interpretation of the following result: 


((—p, ly) = (p,p) + po for |p| < p° (9.8.2) 
with ¢ introduced above and a suitable p* > 1. 


In other words, in these systems, while it is very difficult to see an “anoma- 
lous” average entropy creation rate during a time 7 (e.g. p = —1), it is also 
true that “that is the hardest thing to see”. Once we see it all the observables 
will behave strangely and the relative probabilities of time reversed patterns 
will become as likely as those of the corresponding direct patterns under 
“normal” average entropy creation regime. 

A waterfall will go up, as likely as we see it going down, in a world in which 
for some reason, or by the deed of a Daemon, the entropy creation rate has 
changed sign during a long enough time. We can also say that the motion 
on an attractor is reversible, even in the presence of dissipation, once the 
dissipation is fixed. 

The proof of the above theorem is similar to that of the fluctuation theorem 
to which it reduces if F = y = 0 (and in fact it is a repetition of it). To be 
complete we sketch, in the next section, the proof. 

The fluctuation and the conditional reversibility theorems can also be for- 
mulated for systems whose evolution is studied in continuous time (i.e. for 
Anosov flows). The discrete case is simpler to study than the corresponding 
Anosov flows because Anosov maps do not have a trivial Lyapunov expo- 
nent (the vanishing one associated with the phase space flow direction); the 
techniques to extend the analysis to Anosov flows are developed in [BR75], 
[Ge98] (and one achieves the goal of proving the analogue of the fluctuation 
theorem for such systems). 


89.9. “Conditional Reversibility” and “Fluctuation Theorems” 


In §9.4 we have seen that in a Anosov system the stable and unstable 
tangent planes T°, T¥ form an integrable family of planes (and their integral 
surfaces are the stable and unstable manifolds). If x is a point and if J(x) = 
OS (x) is the Jacobian matrix of S at x, then the covariance of the stable and 
unstable planes implies that we can regard its action (mapping the tangent 
plane Ty onto Tsz) as “split” linearly into an action on the stable plane and 
one on the unstable plane: i.e. J(x) restricted to the stable plane becomes 
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a linear map J°(x) mapping T$ to T3,. Likewise one can define the map 

J“(a), [Ru79]. 

Let A,(x), As(x) be the determinants of the Jacobians, i.e. of J” (x), J*(x). 
Their product differs from the determinant A(x) of OS(x) by the ratio of 
the sine of the angle a(x) between the planes T$, T¥ and the sine of the 
angle a(Sx) between T$ ,, Tgp- 


Hence nal 
Rp EO) aha (9.9.1) 
sin a(x) 
and we also set, see §9.5, 
7/2-1 7/2-1 
Aue) =: [| Ar), Acte [l'AS 
j=—T/2 j=-T/2 
7/2-1 
A,(x)= [[ Az). (9.9.2) 
j=—T/2 


Time reversal symmetry, which we assume here, implies that Wf, = 
IW#,W}, = LW and: 


Ar(£) = A (I£), . Nee (ay SA OS. To = sr 
sin a(x) = sina(lx). (9.9.3) 


In 89.5 we have seen that, given the above geometric-kinematical notions, 
the SRB distribution y can be represented by assigning suitable weights to 
small phase space cells, (9.6.2). This is very similar to the representation 
of the Maxwell-Boltzmann distributions of equilibrium states in terms of 
suitable weights given to phase space cells of equal Liouville volume. 

The phase space cells can be made, see §9.5, consistently as small as we 
please and, by taking them small enough, one can achieve an arbitrary 
precision in the description of the SRB distribution u, in the same way as 
we can approximate the Liouville volume by taking the phase space cells 
small. 

The key to the construction and to our proof is a Markov partition, in- 
troduced in §9.5: this is a partition € = (E1,...,E\ ) of the phase space 
C into N cells which are covariant with respect to the time evolution and 
with respect to time reversal in the sense that JE; = Ej for some j’, see 
89.5 for the notion of covariance and for the properties of Markov partitions. 


Given a Markov partition € we can “refine” it “consistently”, see §9.6, as 
much as we wish by considering the partition € -r,r = VIs E whose 
cells are obtained by “intersecting” the cells of € and of its S iterates; the 
cells of € _rT become exponentially small with T — oo as a consequence 
of the hyperbolicity. In each FE; € €_7,7 one can select a center point 
x; = C(E;) (associated with an arbitrary boundary condition in the sense 


9.9.4 
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of remark (6) in 89.6, see also (9.6.1)), so that Iz, is the point selected in 
IE;. Then we evaluate the expansion rate Ay 2Tr(x;) of 5?T as a map of the 
unstable manifold of S~?z; to that of STx;, see (9.6.4). 
Using the elements E; € Er as cells we can define approximations “as 
good as we wish” to the SRB distribution p, as given by (9.6.4), because 
for all smooth observables F defined on C, [Si68],[BR75], 


def Lin Eee r F(xj)AZor(2;) 
T= D oe Nor ep) 
(9.9.4) 
where mr(dy) is implicitly defined here by the ratio in the righ-hand side 
of (9.9.4); (9.9.4) is just a change of notation away from (9.6.4). 

Let A, denote an interval [p,p + dp] and U,, denote a set of of functions 
t — y(t) defined in [—4, 7] and with values in a “tube” of width 7 around 
a given “path” t > y(t), t € [-57, 47]. Let €U be the time reversed set of 
paths, i.e. the set of paths t — y’(t), t € [-$7, $7] with values within 7 of 
ery(—t). Here time is discrete (but the same ideas and deductions would 
apply to a continuous time case, so that we use a notation that makes sense 
in both cases. 

We first evaluate the probability, with respect to the distribution m, 2 in 
(9.9.4), (instead of the mr), of the event that p = p(xj) = o-(x;)/(o), € 
A, and, also, that {F'(S*a;)}7__, € Un, divided by the probability (with 
respect to the same distribution) of the time reversed event that p(x;) = 
o,(a;)/(o), € A-p and, also, k > {F(S*x;)} € eUn. 

Thus we compare the probability of a fluctuation pattern y in the presence 
of average dissipation p and that of the time reversed pattern in the presence 


of average dissipation —p. This is essentially: 


Jar = jim f me(ay) PW) 


=i 
Tr(p) a Ye pou A 2608) AS) (9.9.5) 
Tr(—p) Xy, ple;)=-p,F (8r 2;)=er oln) Nur (aj) 


Since m,/2 in (9.9.4) is only an approximation to u+ an error is involved 
in using (9.9.5) as a formula for the same ratio computed by using the true 
SRB distribution pz instead of m, 2. 

It can be shown that this “first” approximation (among the two that will be 
made) can be estimated to affect the result only by a factor bounded above 
and below uniformly in 7,p. This is not completely straightforward: in a 
sense this is perhaps the main technical problem of the analysis.l$ Further 
mathematical details can be found in [Ga95c],[Ru97c],[Ge98]. 


15 Tt can be seen if one interprets (9.9.5) as a probability distribution on the space of the 
symbolic sequences g which, via the Markov partition E, can be used to represent the 
points x in phase space. Such probability distribution can also be interpreted as a Gibbs 
distribution over the space of the sequences o with potential A(z) = log Au,1(x(a)), if 
g is the symbolic sequence corresponding to xj: see 85.10. In this way the property 
under analysis (i.e. the identity of the limits as T — oo of (9.9.5) and of the same ratios 
evaluated by using mr instead of Miir, 4r] ), appears simply due to the nonexistence 
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Remark: There are other representations of the SRB distributions that 
seem more appealing than the above one based on the Markov partitions 
notion and still make the above analysis possible and apparently more intu- 
itive, e.g. see [MR97]. The simplest is perhaps the periodic orbits represen- 
tation in which the role of the cells is taken by the periodic orbits. However 
I do not know a way of making the argument that leads to (9.9.5) while 
keeping under control the approximations and at the same time not relying 
on Markov partitions; and in fact I do not know of any expression of the 
SRB distribution that is not proved by using the very existence of Markov 
partitions. 


We now try to establish a one-to-one correspondence between the addends 
in the numerator of (9.9.5) and those in the denominator, aiming at showing 
that corresponding addends have a constant ratio which will, therefore, be 
the value of the ratio in (9.9.5). 

This is possible because of the reversibility property, it will be used in the 
form of its consequences given by the relations (9.9.3). The ratio (9.9.5) can 
therefore be written, by virtue of (9.9.3), simply as: 


ae Sree 
Vj p(aj)=p,F(Sr2y)=o(n) Murs) D; penp F (Srel) Mur (4) 


ZI = | 
L; ples) =—p,F(S"23)=ero(—n) Mar (ti) Dj, p(ay)=p,F(Sna;)=ç(n) sr (25) 


where x; € E; is the center in Æj. In deducing the second relation we take 
into account: 


(1) time reversal symmetry 1, 


(2) that the centers zj, x;y of Ej and Ej = IE; are such that zp = Ix;, 
and 


(3) that (9.9.3), (9.9.2) hold, 


and transform the sum in the denominator of the left-hand side of (9.9.6) 
into a sum over the same set of labels that appear in the numerator sum. 
It follows then that the ratios between corresponding terms in the ratio 
(9.9.6) is equal to AV} (x)Aït(x). This differs little from the reciprocal of 
the total change of phase space volume over the 7 time steps (during which 
the system evolves from the point S~7/2z to ST/2x). 

The difference is only due to not taking into account the ratio of the sines 
of the angles a(S-T/2x;) and, see (9.9.2), a($7/?x;) formed by the sta- 
ble and unstable manifolds at the points $~7/?a; and $7/?x;. Therefore 


of phase transitions in the one-dimensional short-range Ising models. In fact the two 
ratios become ratios of expectation values of the same quantities evaluated in presence 
of different boundary conditions, and in absence of phase transitions one should have 
boundary conditions independence which in this case would imply that the two ratios 
differ at most by a factor of order O(1) so that their logarithms divided by T or + 


should have the same limit ¢(p) — ¢(—p). 


9.9.7 
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Ay (z;)Az4(z;) will differ from the actual phase space contraction under 


the action of ST, regarded as a map between S77/?z; and S7/2x;, by a fac- 
tor that can be bounded between B~! and B with B = maxz, x’ (zel), 


sin a(x* 
which is finite and positive, by the linear independence of the stable ae 
unstable planes. 

But for all points x; in (9.9.6), the reciprocal of the total phase space 
volume change over a time T equals eT°+? (by the constraint, o,/0+ = p, 
imposed on the summation labels) up to a “second” approximation that 
cannot exceed a factor which is bounded above and below by T-independent 
positive and finite constants B*!, due to the above sine ratio. Hence the 
ratio (9.9.5) will be the exponential e7’+ , up to a r-independently bounded 
factor and (9.9.3) follows. 

It is important to note that there have been two approximations, as just 
pointed out. They can be estimated, see [GC95],[Ga95c], [Ru97c], and imply 
that the argument of the exponential is correct up to p, p,T-independent 
corrections so that the result can be proved even if the approximations are 
avoided (this also makes clear that the consideration of the limit T — co is 
necessary for the theorem to hold). 

In the special cases in which there is no F and one only looks at the 
probability distribution of the entropy production rate the above becomes 


Tr (p) 


B! eP < 
Tr (—D) 


< BeTP(o)+ (9.9.7) 


or 


Tr(P) _ orpo), +001) (9.9.8) 
Tr (—p) 


i.e. we get (9.7.12). 


89.10. Onsager Reciprocity and Green-Kubo’s Formula. 


The fluctuation theorem degenerates in the limit in which ø tends to 
zero, i.e. when the external forces vanish and dissipation disappears (and 
the stationary state becomes the equilibrium state). 

Since the theorem deals with systems that are time reversible at and out- 
side equilibrium, Onsager’s hypotheses are certainly satisfied and the system 
should obey reciprocal response relations at vanishing forcing. This led to 
the idea that there might be a connection between the fluctuation theo- 
rem and Onsager reciprocity and also to the related (stronger) Green-Kubo 
formula. 

This is in fact true: if we define the microscopic thermodynamic flux j(x) 
associated with the thermodynamic force E that generates it, i.e. the pa- 
rameter that measures the strength of the forcing (which makes the system 
nonHamiltonian), via the relation 


j(x) = (9.10.1) 


9.10.2 


9.10.3 
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(not necessarily at Æ = 0) then in [Ga96b] a heuristic proof shows that the 
limit as E — 0 of the fluctuation theorem becomes simply (in the continuous 
time case) a property of the average, or “macroscopic”, flux J = (j) up: 


oJ 1 


Sole 5 f ASDI p dt (402) 


where (-),,,, denotes the average in the stationary state up (i.e. the SRB 
distribution which, at Æ = 0, is simply the microcanonical ensemble). 

If there are several fields E1, E2,... acting on the system we can define 
several thermodynamic fluxes jp (x) On,o(x) and their averages (jx): in 
the limit in which all forces E;, vanish a (simple) extension of the fluctuation 
theorem is shown, [Ga96b], to reduce to 


1 


| e=0 T2 A (in(Se@) jn (&)) w=0 dt = Len , (9.10.3) 


def OJh 


Lir = = 
hk DE; 


therefore we see that the fluctuation theorem can be regarded as an exten- 
sion to nonzero forcing of Onsager reciprocity and, actually, of the Green- 
Kubo formula. 

Certainly assuming reversibility in a system out of equilibrium can be dis- 
turbing: one can, thus, inquire if there is a more general connection between 
the chaotic hypothesis, Onsager reciprocity and the Green-Kubo formula. 

This is indeed the case and provides us with a further consequence of the 
chaotic hypothesis valid, however, only in zero field. It can be shown that 
the relations (9.10.3) follow from the sole assumption that at Æ = 0 the 
system is time reversible and that it satisfies the chaotic hypothesis for Æ 
near 0: at E Æ 0 it can be, as in Onsager’s theory, not reversible [GR97]. 

It is not difficult to see, technically, how the fluctuation theorem, in the 
limit in which the driving forces tend to 0, formally yields the Green-Kubo 
formula. 

We consider time evolution in continuous time and simply note that 
(9.9.8) implies that, for all Æ (for which the system is chaotic) (elE) = 
Yip T (p) = Ym (—p)eOM = 00) so that: 


1 
lim —log(e/”),, =0 (9.10.4) 


e +00 T 


where Ig es fo(S,x)dt with o(x) being the divergence of the equations 
of motion (i.e. the phase space contraction rate, in the case of continuous 
time). This remark, [Bo97b],16 can be used to simplify the analysis in 
[Ga96b] (and [Ga96a]) as follows. 

We switch to continuous time, to simplify the analysis. Differentiating both 
sides with respect to E, not worrying about interchanging derivatives and 


16 It says that essentially (le), = 1 or more precisely it is not too far from 1 so that 


(9.10.4) holds. 


9.10.5 
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limits and the like, one finds that the second derivative with respect to E 
is a sum of six terms. Supposing that for Æ = 0 the system is Hamiltonian 
and (hence) Jp = 0, the six terms, when evaluated at E = 0, are: 


=| OB le)nele-0 — (Bele) wel e=0+ 
+ f ntel z)OpE(x)|r=0 — (((ete)), p f 18ene)le= ot (9.10.5) 
+ | ocre (x)Our(x)|5- ot fa. On lB| p= o] 


and we see that the fourth and sixth terms vanish being derivatives of 
f ue(dx) = 1, and the first vanishes (by integration by parts) because Ip 
is a divergence and po is the Liouville distribution (by the assumption that 
the system is Hamiltonian at E = 0 and chaotic). Hence we are left with: 


(- =(@ete)*) Jus + = féeret Open (x x)) =0 (9.10.6) 


E=0 


where the second term is 277 lOr((0gle)ur)|e=0o = 20rJplr-o, be 
cause the distribution ug is stationary; and the first term tends to 
i O (j(St2)j(x)) e=0dt as T — œ. Hence we get the Green-Kubo formula 
in the case of only one forcing parameter. 

The argument could be extended to the case in which E is a vector de- 
scribing the strength of various driving forces acting on the system, but one 
needs a generalization of (9.10.4). The latter is a consequence of the fluctu- 
ation theorem, and the theorem has to be extended in order to derive from 
it also the Green-Kubo formula (hence reciprocity) when there were several 
independent forces acting on the system; see [Ga96b] where the extension 
is discussed. 

The above analysis is unsatisfactory because we interchange limits and 
derivatives quite freely and we even take derivatives of wz, which seems to 
require some imagination as up is concentrated on a set of zero volume. 
On the other hand, under the strong hypotheses in which we suppose to 
be working (that the system is mixing Anosov), we should not need extra 
assumptions. Indeed the above mentioned nonheuristic analysis, [GR97], is 
based on the solution of the problem of differentiability with respect to a 
parameter for SRB distributions, [Ru97b]. 


89.11. Reversible Versus Irreversible Dissipation. Nonequilibrium 
Ensembles? 


What is missing are arguments similar to those used by Boltzmann to 
justify the use of the ensembles independently of the ergodic hypothesis: an 
hypothesis which in the end may appear (and still does appear to many) 
as having led to the theory of enembles only “by accident”. The missing 
arguments should justify the fluctuation theorem on the basis of the extreme 
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likelihood of its predictions in systems that are very large and that may not 
be Anosov systems in the mathematical sense. I see no reason why this 
should prove impossible, a priori, now or in the future. 

In the meantime it seems interesting to take the same philosophical attitude 
adopted by Boltzmann: not to consider that “by chance” chaotic systems 
share some selected properties, and try to see if such properties help us 
achieve a better understanding of nonequilibrium. After all it seems that 
Boltzmann himself took a rather long time to realize the interplay of the 
above two basic mechanisms behind the equilibrium ensembles and to pro- 
pose a solution harmonizing them. “All it remains to do” is to explore if the 
hypothesis has implications more interesting or deeper than the fluctuation 
theorem. 

A system driven out of equilibrium can reach a stationary state (and not 
steam out of sight) only if enough dissipation is present. This means that 
any mechanical model of a system reaching a stationary state out of equi- 
librium must be a model with nonconservative equations of motion in which 
forces representing the action of the thermostats, that keep the system from 
heating up, are present. 

Thus, as we stressed repeatedly in the previous sections, a generic model of 
a system stationarily driven out of equilibrium will be obtained by adding to 
Hamilton’s equations (corresponding to the nondriven system) other terms 
representing forces due to the thermostat action. 

Here one should avoid attributing a fundamental role to special assump- 
tions about such forces. One has to realize that there is no privileged ther- 
mostat: many of them can be considered and they simply describe various 
ways to take energy out of the system. 

Hence one can even use stochastic thermostats, and there are many types 
considered in the literature; or one can consider deterministic thermostats 
and, among them, reversible ones or irreversible ones. 

Each thermostat requires its own theory. However the same system may 
behave in the same way under the action of different thermostatting mecha- 
nisms: if the only action we make on a gas tube is to keep the temperatures 
of its extremes fixed, by taking in or out heat from them, the difference may 
be irrelevant, at least in the limit in which the tube becomes long enough 
and as far as what happens in the middle of it is concerned. 

But of course the mathematical representation of the stationary state may 
be very different in the various cases, even when we think that the differences 
are only minor boundary effects. 

For instance, in the case of the gas tube, if our model is of deterministic 
dissipation we expect the SRB state to be concentrated on a set of zero 
phase space volume,” while if the model is stochastic then the stationary 
state will be described by a density on phase space. Nothing could seem 
more different. 


17 Because phase space will on the average contract, when o+ > 0, so that any stationary 
state has to be concentrated on a set of zero volume, which however could still be dense 
and often it will be. 
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Nevertheless it might still be true that in the limit of an infinite tube 
the two models give the same result, in the same sense as the canonical 
and microcanonical ensembles describe the same state even though the mi- 
crocanonical ensemble is supported on the energy surface, which has zero 
volume if measured by using the canonical ensemble (which is given by a 
density over the whole available phase space). 

Therefore we see that out of equilibrium we have in fact much more free- 
dom to define equivalent ensembles. Not only do we have (very likely) the 
same freedom that we have in equilibrium (like fixing the total energy or 
not, or fixing the number of particles or not, passing from microcanonical 
to canonical to grand canonical, etc) but we can also change the equations 
of motion and obtain different stationary states, i.e. different SRB distri- 
butions, which will however become the same in the thermodynamic limit. 


Being able to prove mathematical equivalence of two thermostats will 
amount to proving their physical equivalence. This again will be a diffi- 
cult task, in any concrete case. 

What I find fascinating is that the above remarks seem to indicate to us the 
possibility that a reversible thermostat can be equivalent in the thermody- 
namic limit to an irreversible one. I conclude by reformulating a conjecture, 
see for instance [Ga96c], [Ga97a], [Ga98c], which clarifies the latter state- 
ment. 

Consider the following two models describing a system of hard balls in 
a periodic (large) box in which there is a lattice of obstacles that forbid 
collisionless paths (by their arrangement and size); the laws of motion will 
be Newton’s laws (elastic collisions with the obstacles as well as between 
particles) plus a constant force E along the horizontal axis (say) plus a 
thermostatting force. 

In the first model the thermostatting force is simply a constant times the 
momentum of the particles: it acts on the i-th particle as —vp; if v is a 
“friction” constant. Another model is a force proportional to the momentum 
but via a proportionality factor that is not constant and depends on the 
system configuration at the point x in phase space; it has the form —a(x)p; 
with a(x) = B+ J; pi/ 0; 07. 

The first model is related to the model used by Drude in his theory of 
conduction in metals, see [EGM98]. The second model has been used very 
often in recent years for theoretical studies and has thus acquired a “re- 
spected” status and a special importance: it was among the first models 
used in the experiments and theoretical ideas that led to the connection 
between Ruelle’s ideas for turbulent motion in fluids and nonequilibrium 
statistical mechanics, [HHP87], [ECM90], [ECM93]. I think that the im- 
portance of such work should be stressed and fully appreciated: without 
this work the recent theoretical developments would have been unthinkable, 
in spite of the fact that a posteriori they seem quite independent and one 
could claim (unreasonably in my view) that everything could have been 
done much earlier. 
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Furthermore the second model can be seen as derived from Gauss’ least 
constraint principle, see Appendix 9.A4. It keeps the total (kinetic) energy 
exactly constant over time (taking energy in and out, as needed) and is 
called a Gaussian thermostat. Unlike the first model the second model is 
reversible, with time reversal being the usual velocity inversion. Thus the 
above theory and results based on the chaotic hypothesis apply. 

The conjecture was (and is) that: 


(1) Compute the average energy per particle that the system has in the 
constant friction case and call it E(v) calling also u, the corresponding SRB 
distribution. 


(2) Call ñe the SRB distribution for the Gaussian thermostat system when 
the total (kinetic) energy is fixed to the value €. 


(3) Then py = Ñe) in the thermodynamic limit (in which the box size 
tends to become infinitely large, with the number of particles and the total 
energy correspondingly growing so that one keeps the density and the energy 
density constant) and for local observables, i.e. for observables that depend 
only on the particles of the system localized in a fixed finite region of the 
container. This means that the equality takes place in the usual sense of 
the theory of ensembles, see Chap.II,IV and [Ru68]. 


It has to be remarked that the idea of equivalence between dynamical en- 
sembles, in contexts perhaps more limited, seems to circulate for quite a 
long time particularly among those who work on numerical experiments: 
remarkable are the early papers [ES93], [Ev93], [SJ93] which certainly pro- 
pose the same kind of ideas, see also [MR96]. 

The above conjecture opens the way to several speculations as it shows that 
the reversibility assumption might be not so strong after all. And results 
for reversible systems may carry through to irreversible ones. 

I have attempted to extend the above ideas also to cases of turbulent 
motions but here I can only give references, [Ga97a],[Ga97b]. 

There are a few other results and many speculations about the conse- 
quences of the chaotic hypothesis: among the (few) related results. I want 
to quote the “pairing rule”, valid for a somewhat restricted class of systems. 
It is a further extremely interesting example of a mathematical theorem dis- 
covered through physical experiments and, although heralded by a similar 
result in a simpler case, [Dr88], it was proved only later (like the fluctuation 
theorem), [ECM90], [EM90], [DM96], [WL98]. It is related to the chaotic 
hypothesis but it does not depend on it, the relation being that it emerged 
in the same group of experiments that led to the fluctuation theorem. 

Among the “speculations” I quote: 


(1) Several applications to fluid mechanics, like the equivalence of Navier- 
Stokes equations to a similar reversible, equation (in the limit of large 
Reynolds’ number), [Ga96c], [Ga97a], [SJ93]. 
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(2) stability of time reversal symmetry whereby under assumptions, that 
I think are quite natural, [BG97], one deduces that when time reversal 
is spontaneously broken?® it is replaced by another symmetry with the 
same property of anticommuting with time evolution, thus showing that 
the fluctuation theorem might hold, with minor modifications, even in cases 
in which the time reversal symmetry of the equations of motion is broken, 
see 81.6 of [BGG97]. 


(3) The possibility of equivalence between reversible and irreversible equa- 
tions describing the same system, in the limit of large systems, [Ga98c]. 
This gives hope that some result like a, suitably reformulated, fluctuation 
theorem can hold even in irreversibly driven systems. Furthermore it seems 
to indicate that the theory of ensembles in nonequilibrium is much richer 
than what we are used to in equilibrium. An ensemble might be character- 
ized not only by the choice of a few parameters, as in Chap.II, but also by 
the choice of the equations of motion. 


The above incomplete list is here only to provide the reader with a guide to 
the literature, which is constantly increasing but which does not yet seem 
established enough to be treated as an accomplished theory deserving more 
space in a short treatise. 


Appendix 9.A1. Mécanique statistique hors équilibre: l’héritage 
de Boltzmann 


The following is a slightly expanded version of a talk at Ecole Normale 
Superieure in Paris, january 1998, and gives an informal overview of the 
basic ideas of Chap.IX, see 89.3, and some supplementary analysis. 


Boltzmann entreprit, [Bo66], de prouver l’existence des atomes en 
poursuivant un programme déja amorcé par ses prédécesseurs. Son approche 
était d’établir que la conception de la matière en tant qu’agglomération 
d’atomes obéissants aux lois de la mécanique conduisait à la déduction des 
propriétés de la matière que connaissaient alors les expérimentateurs et les 
théoriciens. 

Ainsi Boltzmann produisit des versions de plus en plus raffinées du 
théorème de la chaleur, [Bo68], [Bo71], [Bo72], [Bo77]. Au début il s’agissait 
de faire voir qu’il est possible de définir des quantités mécaniques associées, 
par exemple, à un gaz enfermé dans un conteneur cubique de volume V, 
telles que : 

T = énergie cinétique moyenne 

U = énergie totale 

V = volume 

p = impulsion moyenne transférée aux parois 
par collision et par unité de surface 


18 Because the attracting set A becomes strictly smaller than the phase space and JA # A. 


9.A1.1 


9.A1.2 
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où les moyennes sont calculées empiriquement en supposant les particules 
indépendantes et à distribution uniforme sur une sphère dans l’espace des 
impulsions et dans le volume V des positions. 
Le théorème à prouver est alors que si on varie U et V de dU et dV et que 
l’on calcule la quantité : 
dU + pdV 
T 


où p et T dépendent de U et V, on trouve une différentielle exacte : c’est 
à dire qu’il existe une fonction S(U,V) telle que dS = ated Suite 
aux travaux de Boltzmann, Helmholtz considéra les systèmes mécaniques 
monocycliques, c’est-à-dire les systèmes dont tout mouvement d’énergie 
donnée est périodique et non dégénéré (ce qui veut dire que les mouvements 
d'énergie donnée ne diffèrent entre eux que par un décalage du temps 
d'observation), [He95a],[He95b]. 

Il fit voir que, en général, si on imagine que les mouvements (“états”) de 
tels systèmes sont paramétrés par leur énergie totale U et par un paramètre 
V dont les potentiels yy des forces qui agissent sur le système dépendent, 
alors en définissant : 


(9.A1.1) 


T = énergie cinétique moyenne 

U = énergie totale 

V = volume 

p = (CE) 
où (F) maintenant dénote précisément la moyenne de F par rapport 
au temps (et donc n’est pas définie empiriquement comme dans le cas 
précedent), on trouve en général : 


dU + pdv 
T 


Celle-ci aurait pu n’étre rien de plus qu’une curiosité. Mais Boltzmann avait 
une conception discrète de la nature : même s’il ne l’avait pas explicitement 
dit dans ses écrits populaires, on le verrait dans ses travaux scientifiques 
où l’emploi de l’analyse, avec ses intégrales et ses dérivées, est souvent vu 
comme un moyen technique pour venir à bout du calcul de sommes et de 
différences, [Bo74]. 

Donc pour Boltzmann le mouvement n’est qu’une évolution discrète où 
l’espace des phases est quadrillé en petites cellules à 6N dimensions (N 
étant le nombre de molécules) dont une contient le point qui représente 
l’état instantané du système. L'évolution apparait comme les déplacements 
successifs du point représentatif d’une cellule à une autre, alors que le temps 
s’écoule d’une petite quantité discrète h. Bien sûr le déplacement doit être 
conforme aux lois du mouvement. 

C’est une représentation très familière aujourd’hui à qui essaye de simuler 
sur ordinateur les mouvements d’un gaz de particules. Sur l'ordinateur 
les états microscopiques du gaz sont représentés par des cellules (car les 
coordonnées des points sont représentées par des nombres qui sont déter- 
minés avec une précision qui est loin d’être infinie et qui dépend de la 


— différentielle exacte (9.A1.2) 


9.A1.3 
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machine ou plutôt du logiciel que l’on emploit) et l’évolution se déroule par 
pas discrets; le programme qui effectue ces pas est écrit avec les lois du 
mouvement comme guide. 

De ce point de vue le mouvement est une permutation des cellules qui 
représentent l’état microscopique. Le système est alors toujours en évolution 
périodique : car toute permutation d’un nombre fini d’objets (les cellules 
d'énergie totale U donnée, dans le cas présent) engendre une évolution 
cyclique. 

On imagine que l’on fixe l’énergie totale U et que les forces agissantes sur 
le système sont paramétrées par le volume V : en fait on imagine que, quoi 
que l’on fasse, les forces entre les particules ne varient pas et seules les forces 
entre les particules et les parois peuvent changer (à cause des mouvements 
des parois et des changements de volume qui en découlent). 

Alors l'hypothèse de monocyclicité de Helmholtz, de non-dégénérescence 
des mouvements d’énergie donnée, correspondrait à dire que l’évolution 
est une permutation à un seul cycle des cellules et donc on serait dans 
la situation où le système est monocyclique : cette hypothèse est connue 
comme l’hypothèse ergodique. 

Sous cette hypothèse on devrait avoir la possiblité de trouver, en général, 
une analogie mécanique de la thermodynamique et un théorème général de la 
chaleur. Puisque les moyennes doivent se calculer, alors, par la distribution 
uniforme sur l’espace des cellules d'énergie donnée (car les cellules ont des 
tailles égales) on se trouve obligé de vérifier que dans le cas d’un gaz (ou 
même d’un liquide ou d’un solide, vue la généralités des considérations en 
question): 


ane est exact si p= (se). (9.A1.3) 


Cette propriété, cas particulier d’une propriété plus générale que 
Boltzmann appella orthodicité, doit étre accompagnée par la propriété sup- 
plémentaire que p est aussi l’impulsion moyenne transférée aux parois par 
les collisions, par unité de temps et de surface. Si cela est bien le cas on 
aura prouvé que en général un théorème de la chaleur est valable. 

C’est ce que Boltzmann fit en 1884, [Bo84], en fondant, en méme temps, la 
théorie des ensembles statistiques (qui est souvent attribuée à Gibbs, mais 
pas par Gibbs lui même, [Gi81]). 

Il est tout à fait remarquable que le théorème de la chaleur, (9.A1.2), 
est valable tant pour les petits systèmes (même à une particule, si la 
non-linéarité du mouvement est suffisante de façon à rendre l’hypothèse 
ergodique raisonnable) que pour les grands (avec 102 particules). 


dU + pdV 


L’exactitude de ne dépend pas de la taille du système, [Bo84]. 


Cette indépendance est d’ailleurs une propriété absolument fondamentale 
et elle permit à Boltzmann de se dégager des critiques qui lui étaient 
adressées. 
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Les critiques, par Zermelo et même par Poincaré, étaient subtiles et 
portaient sur le principe selon lequel il serait impossible de déduire 
les lois macroscopiques (irréversibles) d’une mécanique réversible qui est 
nécessairement cyclique et donc apparemment pas irréversible (au bout 
d’un temps de récurrence le système revient à son état initial, contre toute 
intuition sur le comportement des systèmes macroscopiques), [Bo96], [Bo97]. 

Ces critiques s’adressaient surtout à l’équation de Boltzmann et par 
conséquent à l’approche irréversible à l’équilibre. Boltzmann, comme il 
est bien connu, répondit qu’on ne pouvait pas ne pas tenir compte des 
échelles de temps nécessaires à réveler des contradictions. Pour voir, au 
niveau macroscopique, les effets de la réversibilité microscopique, le temps 
qu’il fallait attendre était énorme qu’on le mesure en heures ou en âges 
de l'Univers, [Bo74]. Après quoi on observerait une évolution anormale 
pour revenir presque immédiatement au comportement normal et pour une 
période de durée encore aussi longue. 

Mais cet argument, à la défense de l’équation de Boltzmann, détruisait 
aussi apparemment la signification du théorème de la chaleur et la 
possibilité de déduire la thermodynamique de la mécanique et de l'hypothèse 
ergodique. Car pour que le théorème de la chaleur ait un intérêt quelconque 
il faut que les moyennes dont il parle soient atteintes dans un laps de temps 
raisonnablement court : mais si le temps de récurrence (c’est à dire le temps 
nécessaire au point représentatif du système pour revenir à la cellule initiale 
dans l’espace des phases) est énorme alors les moyennes des observables 
risquent d’être atteintes sur un temps du même ordre, ce qui signifierait 
qu’elles n’ont pas d'intérêt physique. 

Boltzmann aperçut cette difficulté et fut conduit à dire que dans un 
système macroscopique tout se passe comme si les moyennes sur des temps 
courts étaient les mêmes que sur les temps (inobservables) de récurrence. 
Ceci serait dû au fait que si le nombre de particules est très grand, les 
grandeurs d’intérêt thermodynamique prennent la même valeur sur presque 
tout l’espace des phases : ce qui leur permet d’atteindre leur valeur moyenne 
sur des temps très courts qui n’ont rien à voir avec le temps de récurrence 
(qui est infini à tout point de vue). Elle prennent la même valeur parce 
qu’elles sont à leur tour des moyennes sur les particules et ne dépendent pas 
de l’état de particules individuelles. 

Donc l'hypothèse ergodique suggère l’ensemble microcanonique pour le 
calcul des moyennes : c’est un fait général que ces moyennes vérifient les 
relations thermodynamiques qui, d’un autre côté, sont observables grâce à 
la lois des grands nombres qui fait que ces grandeurs ont la même valeur 
partout (ou presque) dans l’espace des phases. 

Il s’en suit que l’hypothèse ergodique n’est pas une justification de la 
thermodynamique et ne joue qu’un rôle cinématique. La thermodynamique 
est une identité mécanique qui devient observable au niveau macroscopique 
grâce à la loi des grands nombres, (81.9). 


Une fois achevée cette admirable construction conceptuelle on se pose la 


9.A1.4 


9.A1.5 
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question de savoir si on peut faire de même dans le cas des systèmes hors 
équilibre. 

Ce sont des systèmes de particules sur lesquels agissent une ou plusieurs 
forces conservatives dont le travail est dissipé dans des thermostats, 
permettant ainsi au système d’atteindre un état stationnaire. 

C’est un problème pas vraiment touché par Boltzmann qui étudia en 
détail le problème du retour à l’équilibre d’un gaz perturbé de son état 
d'équilibre (retour qui se déroule selon l’équation de Boltzmann). Et il 
peut paraître étrange qu’un problème si naturel et d’une telle importance 
soit resté essentiellement ouvert jusqu’à nos jours. 

On remarque immédiatement une profonde différence par rapport au 
probleme de la théorie des états d'équilibre : il n’y a pas une véritable 
théorie macroscopique (comparable à la thermodynamique classique) qui 
puisse servir de guide et qui fournisse des résultats à prouver. 

Une différence technique importante est que l’on peut s’attendre à ce que le 
comportement physique du système dépende de la méthode qu’on emploie 
pour enlever la chaleur produite par le travail des forces qui agissent. Ce 
qui peut donner le souci qu’une théorie générale soit impossible à cause de 
la grande variété de forces thermostatiques qu’on peut imaginer pour un 
même système. 

Mais, à mon avis, il ne s’agit que d’une difficulté apparente qui disparait 
au fur et à mesure qu’on précise la théorie. 

Donc on va imaginer un système de particules sur lesquelles agissent 
des forces externes non conservatives et un mécanisme quelconque qui 
empèche le réchauffement. On va modéliser ce thermostat par des forces 
additionnelles. Par exemple, si le système est un gaz de sphères dures 
enfermées dans un conteneur périodique avec quelques obstacles fixes et 
soumises à un champ de force E, on peut imaginer que les équations du 
mouvement soient : 


ma, = f, +E- vi; = G(x,à) (9.A1.4) 


où les f, sont les forces entre particules (sphéres dures élastiques) et entre 
particules et obstacles (qui sont aussi des sphéres dures élastiques). 

Ici vt = v(x) & est le modèle de thermostat. La vraie difficulté est que 
l’évolution engendre une contraction du volume de l’espace des phases car : 


< (dx di) = div® - (dx dx) (9.A1.5) 
et la dissipativité entraine —(div®) > 0, et donc l’état stationnaire devra 
être une distribution de probabilité p(dz, dx) concentrée sur un ensemble 
de volume nul. Elle ne pourra pas être décrite par une densité de la forme : 
p(x, à) dx, dž. 

Du coup on ne peut même pas écrire les formules qui expriment 
formellement les moyennes des observables par rapport à l’état stationnaire 
en termes d’une fonction de densité inconnue. 
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Néanmoins on voudrait avoir de telles expressions pour pouvoir espérer 
en tirer des conséquences générales, du type du théorème de la chaleur, 
qui puissent être observées dans les petits systèmes (parce que directement 
observables) et dans les grands aussi (pour des raison différentes). 

L'idée clef a pris forme au début des années 1970, 1973 au plus tard, et 
est due à Ruelle : mais dans un contexte apparemment assez différent du 
nôtre (celui de la mécanique des fluides et de la turbulence). On conçoit les 
mouvements turbulents d’un fluide stationnaire ou d’un gaz de particules 
comme des mouvements chaotiques. 


Cela ne demande pas à première vue beaucoup d'imagination : mais le 

point est que l'hypothèse est posée dans un sens technique précis, [Ru78al, 
[Ru80]. Dans l'interprétation d'auteurs successifs, [GC95], on dit que le 
principe est que le système est “hyperbolique” ou d? “Anosov”. C’est 
Vhypothése chaotique. 


Cela veut dire que en tout point x de l’espace des phases on peut établir 
un système covariant de coordonnées locales tel que l’évolution temporelle 
n — S"x observée dans ce système voit x comme un point fixe (car on le 
suit) hyperbolique. C'est-à-dire on voit depuis x les autres points bouger 
de la même façon qu’on les voit si on regarde les mouvements à partir du 
point fixe instable d’un pendule : la différence étant que cela est vrai pour 
tout point (et non pas pour un point isolé comme dans le cas du pendule). 

On aura cette propriété valable à l’équilibre aussi bien que hors équili- 
bre : les mouvements des molécules sont chaotiques même dans les états 
d'équilibre. Pour comprendre ce qui se passe il convient de revenir au point 
de vue discret de Boltzmann. 

Si un système est dissipatif il y a des difficultés supplémentaires car il 
est clair qu’on a beau rendre petites les cellules de l’espace des phases, on 
n’arrivera jamais à un système dynamique discret qui puisse être décrit 
comme une permutation des cellules : la contraction de l’espace des phases 
entraine que certaines cellules ne seront jamais plus visitées même si on les a 
visitées au départ (par exemple parce que l’on a initié le mouvement à partir 
d’elles). Les mouvements se déroulent asymptotiquement sur un attracteur 
(qui est plus petit que tout l’espace des phases, bien que si on considère 
l’espace de phases comme continu l’attracteur pourrait être densel?). 

Mais si on considère seulement les cellules sur lesquelles se déroule le 
mouvement on est dans une situation identique à l’équilibre et hors équilibre. 
On imagine que le mouvement est une permutation à un cycle, et donc il 
y aura un état stationnaire unique. Le temps pour parcourir le cycle sera, 
bien évidemment, toujours du même ordre de grandeur qu’à l’équilibre (dans 
des situations pas trop extrêmes des paramètres qui déterminent les forces 
agissantes sur le système) : donc la raison pour laquelle on peut espérer 
observer les moyennes temporelles et les calculer par intégration par rapport 
à une distribution de probabilité sur l’espace des phases reste la même que 
celle déjà discutée dans le cas d’équilibre (et liée à la loi des grands nombres). 


19 Ce qui montre seulement que la notion de “grandeur” d’un attracteur est plutôt délicate 
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Toutefois il y a une difficulté : c’est une difficulté qu’on aurait pu discuter 
déjà dans le cas de l’équilibre. On a supposé, sans critique, que les cellules 
de l’espace des phases étaient toutes égales. Mais même dans le cas de 
l’équilibre les systèmes sont chaotiques et donc toute cellule est déformée par 
l’évolution temporelle qui la dilate dans certaines directions et la contracte 
dans d’autres. 

Il apparaît alors que la représentation du mouvement comme évolution 
d’une cellule vers une autre de forme et de taille identique est loin d’être 
triviale. Elle est en fait une hypothèse forte sur la dynamique, qui, 
à l’équilibre, sélectionne l’ensemble microcanonique comme distribution 
correcte à utiliser pour calculer les moyennes temporelles (et qui entraine 
le théorème de la chaleur). Il y a bien d’autres distributions invariantes 
sur l’espace des phases (contrairement à ce qu’on entend dire parfois) et 
l'hypothèse apparemment innocente que le mouvement se représente comme 
une permutation de cellules identiques en sélectionne une particulière. 

Hors équilibre la difficulté devient plus manifeste. Car le volume des cellules 
ne reste même pas invariant contrairement au cas de l’équilibre (grâce au 
théorème de Liouville). De plus hors équilibre il faut s’attendre à ce que 
la représentation du mouvement comme évolution de cellules identiques 
conduise à sélectionner une distribution de probabilité particulière sur 
l’espace des phases, concentrée sur les cellules qui constituent l’attracteur, 
[Ga95a]. 

L'intérêt et l’importance des systèmes chaotiques au sens de l’hypothèse 
chaotique est que, en effet, pour tous ces systèmes il y a une unique 
distribution stationnaire u sur l’espace des phases qui donne les moyennes 
des grandeurs observées sur les mouvements qui commencent dans la grande 
majorité des cellules identiques en lesquelles on peut imaginer de diviser 
l’espace des phases. C’est un résultat fondamental dû à Sinai et à Ruelle- 
Bowen : ainsi la distribution u s’appelle distribution SRB, [Si68], [BR75]. 
Dans le cas de l’équilibre, elle coincide avec la distribution microcanonique. 

Ce n’est pas ici le lieu de poursuivre la critique de la vision discréte 
du mouvement, bien qu’elle soit intéressante ne fusse que pour une 
interprétation correcte des simulations numériques qui se font de plus en 
plus fréquentes, voir la note 2° page suivante. 

L'hypothèse chaotique conduit naturellement à une représentation discrète 
différente du mouvement qui non seulement ne souffre pas des critiques 
qu’on vient de mentionner, mais qui nous donne une formule explicite pour 
la valeur des moyennes des observables, valable à la fois à l'équilibre (où elle 
se réduit à l’ensemble microcanonique) et hors équilibre. 

Cette nouvelle représentation est aussi basée sur des cellules : mais elle 
ne sont pas vraiment petites dans le sens qu’elles sont considérablement 
plus grandes que les cellules que l’on a utilisées jusqu’à maintenant et qui 
avaient la taille minimale concevable. On peut donc les appeller “cellules 
à gros grains” ou grosses cellules, réservant le nom de cellules de taille fine 
aux précédentes. 

Il est en effet possible de découper l’espace des phases en cellules 
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E1,E2,... = {E,}42=1. qui forment un pavage ou une partition P et qui 
ont la propriété de covariance. 

Leur bords sont constitués par une réunion d’axes des systèmes locaux 
de coordonnées dont on a parlé plus haut : donc les bords consistent en 
des surfaces qui soit se contractent sous l’action de la dynamique, soit se 
dilatent. On dira que les frontières des cellules de la partition P = E:,Æ:,... 
consistent en une partie qui se contracte ou “stable” et en une partie qui se 
dilate ou “instable”. La propriété de covariance dit alors que sous l’action 
de l’évolution les cellules se déforment mais les parties stables de leur bords 
évoluent de façon à terminer comme sous-ensembles de leur réunion : la 
figure suivante illustre cette propriété simple. 


SA 


Fig. 9.A4.1 


Si on a une telle partition (qui s’appelle partition markovienne) P on peut la 
raffiner en d’autres qui ont la même propriété de covariance : simplement en 
donnant un entier T et considérant la partition constituée par les ensembles 
STE, _,N...STE,, qui, à cause de la contraction et de l’expansion 
de l’espace lors de l’évolution, forment une partition Pr dont les cellules 
deviennent aussi petites que l’on veut en prenant T assez grand. 

Si F est une observable on peut en calculer la valeur moyenne simplement 
en considérant une partition markovienne P (arbitraire, car il n’y a pas 
d’unicité) en construisant la partition Pr avec T assez grand pour que F 
soit constant dans chaque cellule C de Pr et puis en posant : 


_ Ze P(C)F(C) 
Se PO) 


où P(C) est un “poids” convenable. Il est construit en choisissant un point 
c E€ C et en considérant son évolution entre —7 et T où 7 est grand mais 
petit par rapport à T (par exemple T = 4T). 

On considère le point S~7c qui est transformé en S7c en un temps 27. On 
voit que l’axe, par S77c, des coordonnées qui se dilatent sous l’action de 
l’évolution est dilaté, au cours d’un temps 27, par un facteur qu’on appelle 
Aori(c) : alors le poids P(C) peut être choisi égal à Aor (c). 

L’équation (9.A1.6) est la formule qui remplace la distribution 
microcanonique hors équilibre : on peut prouver que l’on s’y ramène sous 
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l'hypothèse chaotique. La question qui se pose est si l’on peut tirer quelques 
conséquences générales de l’hypothèse chaotique moyennant l’usage de la 
représentation (9.A1.6) ci-dessus.?° 


Dans ce contexte, mentionnons que récemment on a réussi à déduire une 
conséquence qui apparemment à un certain intérêt. On va la formuler pour 
un système décrit par une équation différentielle (et donc en temps continu) 
t = f(x) qui engendre un flot t — S*x dans l’espace des phases. On suppose 
aussi que l’évolution est réversible : c’est-à-dire qu’il y a une transformation 
isométrique I de l’espace des phases qui anti-commute avec l’évolution : 
ISt = St]. 


Imaginons un système pour lequel l'hypothèse chaotique soit valable, donc 
décrit par une équation t = f(x) et soit o(x) = —div f(x) la contraction 
de l’espace des phases associée. Supposons que l’on mesure la quantité 
o(S"x) au cours du temps mais avec le système dans son état stationnaire. 
Appellons o} sa moyenne temporelle que l’on suppose non nulle (alors elle 


20 L'expression (9.A1.6) pour la distribution SRB permet d’éclaircir la réprésentation de 
l’évolution comme permutation des cellules à taille fine. On doit imaginer que chaque 
élement C (“cellule à gros grain”) de la partition markovienne Pr, avec un T très grand 
de façon à ce que toute observable F (pertinente pour le comportment macroscopique) 
reste constante sur chaque C : pour une représentation fidèle du mouvement, on imagine 
que chaque C est quadrillé par des cellules très petites “de taille fine” en nombre 
proportionnel à P(C). Par l’évolution les cellules de taille fine se répartissent entre 
les éléments C’ de Pr qui intersectent SC. On fait évoluer de la même façon les autres 
cellules fines des éléments de Pr : la théorie des distributions SRB montre que le nombre 
des cellules de taille fine qui viennent se trouver dans chaque C € Pr ne change pas, à 
une très bonne aproximation près; c’est la stationnarité de la distribution SRB, [Ga95al]. 
Alors on peut définir l’évolution des cellules de taille fine simplement en disant qu’une 
cellule fine 6 dans C évolue dans une des cellules fines qui sont dans la C’ qui contient 
Sô; il faut seulement faire attention à ne pas associer une même cellule fine de C” à deux 
cellules fines appartenant à diffefentes C (parmi celles telles que SCNC’ # Ø) : on peut 
s'arranger de façon telle que la permutation des cellules fines ainsi définie soit à un seul 
cycle, car les détails du mouvement à l’intérieur des cellules C n’ont pas d’importance 
parce que les observables qui nous intéressent sont constantes dans les C. Mais la même 
construction peut être faite en remplaçant le poids P(C) par P(C)* avec a #1: on 
obtient ainsi d’autres distributions stationnaires différentes de la SRB, et on peut même 
en construire d’autres, [Si68], [Bo70]. On peut représenter de la même façon aussi ces 
autres distributions : mais on doit imaginer que les cellules de taille fine que l’on utilise 
pour en représenter une soient différentes de celles utilisées pour représenter les autres. 
En fin de compte toutes les cellules fines ainsi introduites représentent l’attracteur. 
Si on divise l’espace entier en (beaucoup de) cellules fines, de façon à ce que toutes 
distributions stationnaires puissent être représentées par une permutation des cellules 

fines qui se trouvent dans les C € Pr, alors on obtient une réprésentation discète très 

fidèle du mouvement. Mais toutes les cellules ne feront pas partie d’un cycle, car la 
dynamique est en général dissipative et une grande partie d’entre elles ne reviennent 
pas sur elles mêmes mais “tombent sur l’attracteur” où, dès lors, elles évoluent dans 
un cycle. La théorie de la distribution SRB montre que si on considère un ensemble 
ouvert dans l’espace des phases le comportement asymptotique du mouvement de tout 
point, sauf un ensemble de volume nul, est bien réprésenté par la distribution SRB, ce 
qui lui fait jouer un rôle particulier, au contraire des autres distributions que l’on peut 
definir : c’est-à-dire que la grande majorité (en volume) des cellules fines tombant sur 
’attracteur vont se trouver parmi celles que l’on a associées aux cycles de la distribution 
SRB. 
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ne peut être que positive par un théorème général, [Ru96a]) et : 


rs : o(S'x)dt (9.A1.7) 


TO+ —Èr 


et soit m;(p) = e7S) la distribution de probabilité de cette observable. 
Alors : 


c’est le théorème de fluctuation, [GC95]. 

Je ne peux pas discuter ici la signification physique du théorème et 
de l’hypothèse de réversibilité, mais il est intéressant de souligner sa 
généralité, son indépendance du système considéré et aussi l’absence, dans sa 
formulation, de paramètres libres. Ce qui le rend en un certain sens analogue 
au théorème de la chaleur, qui lui aussi est général et sans paramètres libres. 

Il suffira de dire que le théorème de fluctuation est une propriété qu’il faut 
quand même vérifier expérimentalement : en effet une partie de la théorie 
ci-dessus est née à la suite d’une expérience de simulation numérique et 
pour en interpréter théoriquement les résultats, [ECM93]. Il y a eu aussi 
quelques vérifications indépendantes, [BGG97], [LLP97]. 

La raison pour laquelle des expériences sont nécessaires est qu’il n’y a aucun 
espoir de prouver que des systèmes réels vérifient au sens mathématique 
du mot l’hypothèse chaotique; moins encore de prouver que des systèmes 
réel vérifient l’hypothèse ergodique. Il n’y a même pas d’espoir de prouver 
que des systèmes intéressants en simulation numérique ou dans la réalité 
vérifient des propriétés qui soient assez proches de celles des systèmes 
hyperboliques pour en déduire des conséquences telles que le théorème 
de fluctuation. Mais on peut croire que néanmoins “les choses se passent 
comme si l'hypothèse chaotique était littéralement vraie”. 

Il y a donc une nécessité d’un contrôle expérimental car on est dans la 
même situation qu’à l'équilibre : où tout en croyant, avec Feynman, que 
“if we follow our solution [1.e. motion] for a long enough time it tries 
everything that it can do, so to speak’ (see p. 46-55 in [Fe63], vol. J), 
il a été néanmoins nécessaire de faire de bonnes vérifications expérimentales 
pour ne plus avoir de réserves ou de doutes sur l’hypothése ergodique dans 
la théorie de l’équilibre. 

Quelques références sont données ici pour guider le lecteur dans la 
littérature récente et ancienne. Elles sont loin d’être exhaustives : [HHP87], 
[EM90], [ECM90], [DPH96], [Ga95c], [Ga96a], [Ga97b], [Ge98], [GR97], 
[Ga98a], [Ga98b], [Ga96c], [Ga97a], [Ru97a], [Ru97c], [BG97], [MR97b], 
[Ku97]. 


Appendix 9.A2. Heuristic Derivation of the SRB Distribution 


The discussion below follows [Ga95a],[Ga95d],[Ga98c], see also [Ga81]. A 
ball B containing a unit mass uniformly spread in it with density p and 
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centered around a fixed point?! O for an Anosov map § which is a Anosov 
mixing map (hence such that the stable and unstable manifolds of O fill 
densely the phase space 2 on which S acts, see §9.4) will be elongated 
along the unstable manifold of O; in so doing the map S will compress the 
mass so that after T iterations the image STB will coat a large portion of 
W6 with a thin coating of mass. 

The mass around an infinitesimal surface element ô around a point x € W6 
which is reached by the spreading coating, i.e. which has the form x = ST y 
for some y in the connected part of WS N B which contains O, will be the 
one that at time 0 was above the image ST of 6 (of extremely small size, 
area |d| and very close to the origin for T large and x fixed): i.e. it will 
be proportional to the area of the image times p and the proportionality 
constant will be essentially the power n of the radius À of the ball if n is the 
dimension of the stable manifold of O. In formulae the mass dy in question 
will be: 

du = ph” Az (STe) lô| . (9.A2.1) 


For T large we see therefore that the mass initially in B will coat a finite (but 
very large and increasing with T) surface of the (dense) manifold W3. Hence 
we see that the SRB distribution, which should be the limit to which the 
described distribution of mass should tend, will be in a sense “concentrated” 
along the unstable manifold WY. 

Let 6,06’ be two (infinitesimal) surface elements on W6 centered around 
x, x’ respectively with x, x’ close and on the same stable manifold W$. And 
suppose that the stable manifolds through the points of 6 intersect 6’ and 
vice-versa (i.e. 6,6’ are the bases of a “tube” whose generators are the stable 
manifolds). See Fig. 9.A5.1: x! 

6! 


ne Fig. 9.A5.1 
Zis 


where the vertical lines represent stable manifolds and the horizontal parts 
of the unstable manifold. The surface elements 6, 6’ are infinitesimal parts 
of a connected surface which, being very large and winding around on the 
manifold, “almost fills” the whole manifold. The surface (i.e. the unstable 
manifold of O) is not drawn; it would connect the two surface elements. We 
see that the ratio of the masses originally in B and coating 6 and 0’ is 


TS 


Tv 
AS Te) 0 pead 


where |6|,|6’| denote the surface areas of the elements 6,6’; but the ratio 


21 Tt is not restrictive to suppose that there is a fixed point for S. In fact Anosov systems 
always admit periodic points: they are always dense on phase space F. If w is a periodic 
point of period N then it is a fixed point for SN. Clearly the map SV is still an Anosov 
map, so that the following discussion will apply to the map SN and even to S provided 
we take T an integer multiple of N. 
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|5|/|0’| can be expressed as 


lô] 


16| 


S-T(ST5) A73 

= se =- T (9.A2.3) 
(ST 8") T 

and by the composition rule of derivatives (and Jacobian determinants) we 

see that the (9.A2.2) can be written 


Ars a) Aur(S"x) 1874 (9.42.4) 
Aur(s- Ta’) A r (ST 2‘) |S75"| a 
but the last ratio approaches 1 as T — oo because 6, 6’ are infinitesimal and 
STS, 576’ get closer and closer as T — oo. The limit as T — co is 


ry Aalsa) 
jim (9.A2.2) = II 


—____., 9.A2.5 
AAS) Coo’ 


k=—0oo 


It is not difficult to check that the infinite product on the right converges 
because the points S*a, S*a’ tend to get close exponentially fast both in the 
future and in the past, being at the same time on the stable and on the 
unstable manifolds of each other. 

Thus the coating generated by the splashing (due to the time evolution) 
of the mass initially in the ball B of the unstable manifold W6 will have 
the formal density [[7_,, AV (S~*x) which can be written (formally) as 
e`" with H = oe log At PE) Since x can be represented as a 
sequence g via its symbolic coding, x = x(o), on a Markov partition E we 
can define À,(o) = logA,1(x(œ)) and obtain that the SRB distribution 
u will be represented as a distribution on the space of the (compatible) 
sequences associated with the Markov partition and it will have the formal 
expression 


Au (9F a) 


(do) = const Hs (9.A2.6) 


if J is the shift on the bilateral sequences ø. 

Recalling 85.10 we see that the distribution u can be interpreted as a Gibbs 
state for a one-dimensional Ising model with short-range interaction in the 
sense of the discussion in Chap.V following (5.10.12), see 85.10.. 

The function À,(c) has in fact all the properties needed for the lat- 
ter interpretation, as a consequence of the discussion in 89.5. The 
above heuristic discussion establishes the connection between the statis- 
tical mechanics of one-dimensional spin systems with short-range interac- 
tions and the apparently highly nontrivial dynamics of an Anosov system. 
The above remarks lay the foundations of the thermodynamic formalism, 
(Si68],[Ru76],[Bo74],[Ru78b]. 

In a sense the reduction of the system to an Ising model is the chaotic 
dynamics analogue of the integration by quadratures of classical mechanics. 
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Appendix 9.A3. Aperiodic Motions Can be Begarded as Periodic 
with Infinite Period! 


This famous and criticized statement of Boltzmann, [Bo66], is the heart of 
the application of the heat theorem for monocyclic systems to a gas in a 
box. Imagine the box containing the gas to be covered by a piston of section 
A and located to the right of the origin at distance L, so that V = AL. 

The microscopic model for the piston will be a potential G(L — £) if x = 
(€,7,¢) are the coordinates of a particle. The function G(r) will vanish for 
r > ro, for some ro < L, and diverge to +oo at r = 0. Thus ro is the width 
of the layer near the piston where the force of the wall is felt by the particles 
that happen to roam there. 

Noting that the potential energy due to the walls is y = 37; ÿ(L—6;) and 
that Ovy~ = A~!O,y we must evaluate the time average of 


OLp(x) = — > P(L—é;). (9.43.1) 


As time evolves the particles with €; in the layer within ro of the wall will 
feel the force exercised by the wall and bounce back. Fixing the attention 
on one particle in the layer we see that it will contribute to the average of 
OLy(x) the amount 


z a -7 (L —&)dt (9.43.2) 


total time Ji 


if to is the first instant when the point j enters the layer and tı is the 
instant when the €-component of the velocity vanishes “against the wall”. 
Since — B'(L —€;) is the €-component of the force, the integral is 2m|£;| (by 
Newton’s law), provided é > 0 of course. One assumes that the density is 
low enough so that no collisions between particles occur while the particles 
travel within the range of the potential of the wall: i.e. the mean free path 
is much greater than the range of the potential Y defining the wall. 

The number of such contributions to the average per unit time is therefore 
given by pwal A ess 2mv f(v) v du if pwa is the density (average) of the 
gas near the wall and f(v) is the fraction of particles with velocity between v 
and v+ dv. Using the ergodic hypothesis (i.e. the microcanonical ensemble) 
and the equivalence of the ensembles to evaluate f(v) it follows that: 


p — (Ov) = Pwal b! (9.A3.3) 


where Bot = kBT with T the absolute temperature and kg Boltmann’s 
constant. Hence we see that (9.43.3) yields the correct value of the pres- 
sure, see Chap.I, Chap.ll; in fact it is often even taken as the microscopic 
definition of the pressure, [MP72]. 

On the other hand we have seen in §9.1, (9.1.7) (repeating the analysis 
in Appendix 1.A1, Chap.I), that if all motions are periodic the quantity p 
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in (9.A3.3) is the right quantity that would make the heat theorem work. 
Hence regarding all trajectories as periodic (i.e. the system as monocyclic) 
leads to the heat theorem with p,U,V,T having the right physical inter- 
pretation. And Boltzmann thought since the beginning of his work that 
trajectories confined into a finite region of phase space could be regarded 
as periodic possibly with infinite period, [Bo66]. 


Appendix 9.A4. Gauss’ Least Constraint Principle 


Let y(z,x) = 0, (&,x) = {£;,z;} be a constraint and let R(t, x) be the 
constraint reaction and F(z, x) the active force. 

Consider all the possible accelerations a compatible with the constraints 
and a given initial state z,z. Then R is ideal or satisfies the principle of 
minimal constraint if the actual accelerations a; = CE i + Ri) minimize 
the effort 

N 


5 (Ei — mia;) = JE; —mia;)- da; = 0 (9.A4.1) 


i=l i=1 


3 | 


for all possible variations ĝa, compatible with the constraint p. Since all 
possible accelerations following &,x are such that pean Oz, p(k, x) - da; = 0 
we can write 


E, — mia; — a ôs p(t, x) = 0 (9.44.2) 
with a such that d 
7 (z,z) = 0, (9.44.3) 
l.e. | 
a= 2 (h Oe, + Ea Oep) Ge" On + 3 LP) (9.A4.4) 


Dime (02,9) 
which is the analytic expression of the Gauss’ principle, see [Wi89]. 

Note that if the constraint is even in the #, then a is odd in the velocities: 
therefore if the constraint is imposed on a system with Hamiltonian H = 
K +V, with K quadratic in the velocities and V depending only on the 
positions, and if other purely positional forces (conservative or not) act on 
the system then the resulting equations of motion are reversible if time 
reversal is simply defined as velocity reversal. 


The Gauss’ principle has been somewhat overlooked in the Physics litera- 
ture in statistical mechanics: its importance has again only recently been 
brought to the attention of researchers, see the review [HHP87]. A notable, 
though by now ancient, exception is a paper of Gibbs, [Gi81], which develops 
variational formulas which he relates to Gauss’ principle of least constraint. 

Conceptually this principle should be regarded as a definition of ideal non- 
holonomic constraint, much as D’Alembert’s priciple or the least action 
principle are regarded as the definition of ideal holonomic constraint. 
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